FastMCP Server
Helm chart for deploying FastMCP Server on Kubernetes — a MCP (Model Context Protocol) server that dynamically loads tools, resources, prompts, and knowledge bases from multiple sources.
Key Features
- Multi-source loading — tools, resources, prompts, and knowledge from inline ConfigMaps, S3-compatible storage (AWS S3, MinIO, R2), or Git repositories
- Merge precedence — Inline (highest) > S3 > Git (lowest) — override S3 tools locally without touching the bucket
- Bearer and JWT authentication — built-in via FastMCP’s
StaticTokenVerifierandJWTVerifier - Knowledge base support — serve Markdown files as MCP resources for RAG and context injection
- Extra pip packages — install additional Python packages at startup before loading tools
- Tool metadata — optional
__tags__,__timeout__,__annotations_mcp__module variables for tool categorization and behavior hints - Resource templates — parameterized URIs like
users://{user_id}/profilefor dynamic resources - Multiple resources per file —
RESOURCESdict maps multiple URIs to handler functions - Error masking — hide internal error details from clients via
MCP_MASK_ERROR_DETAILS - Duplicate handling — control behavior when tools share names via
MCP_ON_DUPLICATE_TOOLS - Built-in Web UI — dashboard at
/uiwith tools/resources/prompts explorer (Alpine.js + Tailwind CDN) - Prometheus metrics — tool call counts, durations, errors, source sync status at
/metrics - Structured JSON logging —
LOG_FORMAT=jsonfor Loki, ELK, CloudWatch, Datadog - Dedicated health endpoints —
/healthz(liveness),/readyz(readiness),/startupz(startup) - Diagnostic endpoint —
GET /debug/infowith full server introspection - Init container pattern — pre-sync sources before server starts via
initSync.enabled - Strict loading —
MCP_STRICT_LOADING=truefails on boot if any tool/resource has errors - Hot reload — automatic tool/resource reload on filesystem changes via
MCP_HOT_RELOAD=true - Periodic sync — poll S3/Git sources for changes at configurable intervals
- Webhook reload —
POST /reloadendpoint for CI/CD-triggered reloads - OCI artifact source — pull tool bundles from OCI registries via ORAS
- Selective sync — include/exclude glob patterns for source filtering
- Gateway mode — compose multiple MCP servers via
MCP_MODE=gatewayandMCP_MOUNT_SERVERS - Tag visibility — enable/disable tools by tags with
MCP_ENABLE_TAGSandMCP_DISABLE_TAGS - Multi-auth — combine bearer + JWT providers via
MCP_AUTH_PROVIDERS - Tool-level scopes —
__required_scopes__module variable for authorization - Context integration — tools can use
ctx: Contextfor progress, logging, sampling, elicitation, and session state - Rate limiting —
__rate_limit__module variable orMCP_RATE_LIMIT_DEFAULTenv var (sliding window) - Caching —
__cache_ttl__module variable for idempotent tool result caching - Tool sandboxing —
__max_memory_mb__and__max_output_size_kb__resource limits per tool - PodDisruptionBudget —
pdb.enabledfor zero-downtime rolling updates - HorizontalPodAutoscaler —
autoscaling.enabledfor auto-scaling based on CPU/memory - Security — Trivy vulnerability scan, CycloneDX SBOM, Cosign keyless signing, SLSA provenance
Installation
HTTPS Repository
helm repo add helmforge https://repo.helmforge.dev
helm repo update
helm install fastmcp-server helmforge/fastmcp-server
OCI Registry
helm install fastmcp-server oci://ghcr.io/helmforgedev/helm/fastmcp-server
Basic Example — Inline Tools
# values.yaml
sources:
inline:
tools:
greet.py: |
def greet(name: str) -> str:
"""Greet someone by name."""
return f"Hello, {name}!"
math_ops.py: |
def add(a: float, b: float) -> float:
"""Add two numbers."""
return a + b
def multiply(a: float, b: float) -> float:
"""Multiply two numbers."""
return a * b
knowledge:
overview.md: |
# Product Overview
This document provides context for the AI assistant.
S3 Source (MinIO, AWS S3, Cloudflare R2)
sources:
s3:
enabled: true
endpoint: 'https://minio.example.com'
bucket: mcp-tools
region: us-east-1
prefix: production
accessKey: minioadmin
secretKey: minioadmin
Git Source
sources:
git:
enabled: true
repository: 'https://github.com/your-org/mcp-tools.git'
branch: main
path: '' # optional subdirectory
token: ghp_xxx # for private repos
Authentication
Bearer Token
auth:
type: bearer
bearer:
token: my-secret-token
# or use an existing Kubernetes secret:
# existingSecret: my-auth-secret
# existingSecretKey: token
JWT
auth:
type: jwt
jwt:
issuer: 'https://auth.example.com'
audience: 'mcp-server'
jwksUri: 'https://auth.example.com/.well-known/jwks.json'
Observability
Prometheus Metrics
metrics:
enabled: true
serviceMonitor:
enabled: true # requires Prometheus Operator
interval: 30s
Metrics exposed at /metrics: tool call counts, durations, errors, source sync status, auth attempts.
Structured Logging
server:
logFormat: json # JSON output for log aggregation
Health Endpoints
| Endpoint | Type | When 200 |
|---|---|---|
/healthz | Liveness | Always (process running) |
/readyz | Readiness | Sources synced + components loaded |
/startupz | Startup | Full initialization complete |
Diagnostics
GET /debug/info returns server version, FastMCP version, uptime, registered components, source status, auth type, and configuration.
Rate Limiting
rateLimiting:
default: '100/min' # global default
perTool:
DEPLOY: '5/min' # per-tool override
DELETE_DATA: '2/min'
Or via module-level variable:
__rate_limit__ = "5/min"
def deploy(service: str, version: str) -> str:
"""Deploy a service (rate limited)."""
return f"Deployed {service}@{version}"
Caching
__cache_ttl__ = 300 # cache results for 5 minutes
def get_exchange_rate(currency: str) -> float:
"""Get current exchange rate (cached 5min)."""
...
caching:
enabled: true # default
maxSize: 1000 # max entries per tool
Tool Sandboxing
__max_memory_mb__ = 256
__max_output_size_kb__ = 100
def process_data(data: str) -> str:
"""Process data with resource limits."""
...
Autoscaling
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 80
pdb:
enabled: true
minAvailable: 1
Key Values
| Key | Default | Description |
|---|---|---|
image.repository | docker.io/helmforge/fastmcp-server | Container image |
image.tag | 1.0.0 | Image tag |
server.name | fastmcp-server | Server name in MCP responses |
server.port | 8000 | HTTP port |
server.path | /mcp | MCP endpoint path |
server.logFormat | text | Log format: text or json |
server.strictLoading | false | Fail on boot if component errors |
ui.enabled | true | Enable Web UI at /ui |
metrics.enabled | false | Enable Prometheus metrics at /metrics |
auth.type | none | Authentication: none, bearer, jwt |
rateLimiting.default | "" | Default rate limit (e.g., 100/min) |
caching.enabled | true | Enable result caching |
sandboxing.maxMemoryMb | 0 | Default max memory per tool (MB) |
sandboxing.maxOutputSizeKb | 0 | Default max output per tool (KB) |
sources.inline.tools | {} | Inline Python tool files |
sources.inline.knowledge | {} | Inline knowledge base files |
sources.s3.enabled | false | Enable S3 source |
sources.s3.bucket | "" | S3 bucket name |
sources.git.enabled | false | Enable Git source |
sources.git.repository | "" | Git repository HTTPS URL |
extraPipPackages | [] | Extra pip packages to install at startup |
initSync.enabled | false | Run source sync as init container |
persistence.enabled | false | Enable persistent workspace volume |
autoscaling.enabled | false | Enable HPA |
pdb.enabled | false | Enable PodDisruptionBudget |
ingress.enabled | false | Enable ingress |
Operational Notes
- Merge precedence is Inline > S3 > Git — if a tool with the same filename exists in multiple sources, the highest-precedence version wins
- Knowledge base files are served as MCP resources at
knowledge://{filename}URIs - Tools are Python files with top-level functions; resources need a
RESOURCE_URIorRESOURCESmodule-level variable - Tools support optional metadata:
__tags__(set),__timeout__(float),__annotations_mcp__(dict) - Resource URIs can use
{param}placeholders for dynamic templates - The
extraPipPackageslist installs before tools load — use it when tools import external libraries - The Web UI auto-refreshes every 15 seconds and requires no external dependencies
- Init container pattern (
initSync.enabled) separates source syncing from server startup for better Kubernetes readiness semantics