Skip to main content

Scaling and Performance

This guide explains how to scale MCPServer and Virtual MCP Server (vMCP) deployments.

Vertical scaling

Vertical scaling (increasing CPU/memory per instance) is the simplest approach and works for all use cases, including stateful backends.

To increase resources, configure podTemplateSpec in your VirtualMCPServer:

spec:
podTemplateSpec:
spec:
containers:
- name: vmcp
resources:
requests:
cpu: '500m'
memory: 512Mi
limits:
cpu: '1'
memory: 1Gi

Vertical scaling is recommended as the starting point for most deployments.

Horizontal scaling

Horizontal scaling (adding more replicas) can improve availability and handle higher request volumes.

How to scale horizontally

Set the replicas field in your VirtualMCPServer spec to control the number of vMCP pods:

VirtualMCPServer resource
spec:
replicas: 3

When replicas is not set, the operator does not manage the replica count, leaving it to an HPA or other external controller. You can also scale manually or with an HPA:

Option 1: Manual scaling

kubectl scale deployment vmcp-<VMCP_NAME> -n <NAMESPACE> --replicas=3

Option 2: Autoscaling with HPA

kubectl autoscale deployment vmcp-<VMCP_NAME> -n <NAMESPACE> \
--min=2 --max=5 --cpu-percent=70

Session storage for multi-replica deployments

When running multiple replicas, configure Redis session storage so that sessions are shared across pods. Without session storage, a request routed to a different replica than the one that established the session will fail.

VirtualMCPServer resource
spec:
replicas: 3
sessionStorage:
provider: redis
address: redis-master.toolhive-system.svc.cluster.local:6379
db: 0
keyPrefix: vmcp-sessions
passwordRef:
name: redis-secret
key: password

See Redis Sentinel session storage for a complete Redis deployment guide.

warning

If you configure multiple replicas without session storage, the operator sets a SessionStorageMissingForReplicas status condition on the resource. Ensure Redis is available before scaling beyond a single replica.

MCPServer horizontal scaling

MCPServer creates two separate Deployments: one for the proxy runner and one for the MCP server backend. You can scale each independently:

  • spec.replicas controls the proxy runner pod count
  • spec.backendReplicas controls the backend MCP server pod count
MCPServer resource
spec:
replicas: 2
backendReplicas: 3
sessionStorage:
provider: redis
address: redis-master.toolhive-system.svc.cluster.local:6379
db: 0
keyPrefix: mcp-sessions
passwordRef:
name: redis-secret
key: password
Stdio transport limitation

Backends using the stdio transport are limited to a single replica. The operator rejects configurations with backendReplicas greater than 1 for stdio backends.

When horizontal scaling is challenging

Horizontal scaling works well for stateless backends (fetch, search, read-only operations) where sessions can be resumed on any instance.

However, stateful backends make horizontal scaling difficult:

  • Stateful backends (Playwright browser sessions, database connections, file system operations) require requests to be routed to the same instance that established the session
  • Session resumption may not work reliably for stateful backends

The VirtualMCPServer and MCPServer CRDs include a sessionAffinity field that controls how the Kubernetes Service routes repeated client connections. By default, it uses ClientIP affinity, which routes connections from the same client IP to the same pod:

spec:
sessionAffinity: ClientIP # default

For stateful backends, vertical scaling or dedicated instances per team/use case are recommended instead of horizontal scaling.

Next steps