Skip to content

MCP Runbook (Operations)

Operator procedures for the MCP server hosting capability. For the design see MCP Architecture; for the tenant guide see MCP Server Hosting; for the decision record see ADR-12.

Deploy & enable

The MCP runtime is a chart component, off by default. Enable it in the Helm values:

yaml
mcp:
  enabled: true
  networkPolicy:
    enabled: true
    gatewayNamespaces: [knative-serving, kourier-system]
  quotas:
    mode: enforced            # enforced | unbounded
    maxServersPerTenant: 10
    maxToolsPerServer: 50
    toolCallsPerMinutePerServer: 600
    toolCallsPerMinutePerOAuthClient: 300

This deploys the RBAC + the internal-only NetworkPolicy (charts/in-falcone/templates/mcp/). MCP servers themselves are Knative Services created per tenant; they require Knative Serving + Kourier (already used by functions). OpenShift overlays inherit the non-root securityContext.

Network isolation & the CNI caveat

The NetworkPolicy (<release>-mcp-server-internal-only) selects in-falcone.io/component: mcp-server: inbound only from the Knative ingress namespaces, egress only to DNS + the platform namespace — so a server cannot reach another tenant's services.

NetworkPolicy enforcement needs a policy CNI

NetworkPolicy is honored only under a policy-enforcing CNI (Calico/Cilium). The kind test cluster runs kindnet, which does not enforce it — the policy applies cleanly and selects the right pods (verified with kubectl apply --dry-run=server), but the behavioral cross-namespace isolation proof must run on a policy CNI in production/CI.

OAuth Authorization Server

Per-tool scopes are Keycloak client scopes; clients are registered through the control-plane's curated DCR plan (never via raw Keycloak admin). The Keycloak admin credential is the in-falcone-keycloak-admin secret (keys username / password) in the platform namespace. The realm's OIDC discovery exposes the dynamic client registration endpoint (/realms/{realm}/clients-registrations/openid-connect) and the authorization_code / client_credentials / refresh_token grants.

Supply-chain controls

A server version is only registered when its image is digest-pinned (image@sha256:…). Deploy is rejected for an unpinned / latest image, a registry not on the allow-list, or a signature that did not verify (cosign verdict injected at the deploy path). A version bump that changes a tool's description or scope is held for review and does not serve until approved; rollback re-activates a prior approved digest.

Quotas & rate limits

Defaults are in mcp.quotas (above); the resolved plan overrides them per tenant. enforced blocks on breach (QUOTA_EXCEEDED for server/tool counts, RATE_LIMITED HTTP 429 + retryAfter for tool calls); unbounded never blocks. Breaches are recorded in the mcp audit subsystem. MCP usage appears in the per-tenant quota posture via the mcp_tool_invocations dimension.

Scale-to-zero

Servers run with min-scale: 0. An idle server scales to zero (~30s) and cold-starts on the next request (~1.2s observed in the ADR-12 spike). No idle cost.

Observability

Tool-call metrics: in_falcone_mcp_tool_invocations_total (domain mcp_tool_usage) + latency on in_falcone_component_operation_duration_seconds (subsystem=mcp). Audit: the mcp subsystem in the audit pipeline, queryable tenant-scoped in the console. All MCP observability/audit contracts are enforced by the validate:observability-* gates and the contract unit tests.

E2E suite

The real-stack Playwright suite lives at tests/e2e/specs/mcp/ (full loop, cross-tenant isolation, version-pinning) with a per-issue smoke at tests/e2e/specs/issues/add-mcp-e2e.spec.ts. Run it with the standard runner (ephemeral namespace, always torn down):

sh
bash tests/e2e/run-issue.sh add-mcp-e2e        # per-issue
cd tests/e2e && npx playwright test specs/mcp   # full MCP suite (needs a running stack)

The specs probe whether the control-plane serves the MCP management API and skip with a precise reason when it is absent — so they never report a false green. They execute the full loop the moment the routes are wired.

Pending integration

The control-plane runtime (apps/control-plane/src/runtime/server.mjs) does not yet serve /v1/mcp/... management routes — the MCP modules are pure logic and contracts. Wiring them into the runtime as live HTTP handlers is the remaining step that flips the E2E gate green and lets the platform serve the management API end to end.

Common failure modes

SymptomLikely causeAction
MCP server pod not reachable from the gatewayNetworkPolicy ingress namespace mismatchconfirm mcp.networkPolicy.gatewayNamespaces match the Knative ingress namespaces
Cross-namespace traffic not blocked on kindkindnet does not enforce NetworkPolicyrun isolation tests on a Calico/Cilium cluster
Deploy rejectedimage unpinned / unsigned / disallowed registrypin by digest, sign, and allow-list the registry
New version not servingtool description/scope changed → held for reviewapprove the version, or roll back
Tool calls returning 429per-server / per-OAuth-client rate limitraise the plan limit or back off; check the mcp audit for the breach
/v1/mcp returns 404management routes not wired into the runtime yetsee Pending integration above

Released under the MIT License.