Skip to content

FastMCP Server

Helm chart for deploying FastMCP Server on Kubernetes — a MCP (Model Context Protocol) server that dynamically loads tools, resources, prompts, and knowledge bases from multiple sources.

Key Features

  • Multi-source loading — tools, resources, prompts, and knowledge from inline ConfigMaps, S3-compatible storage (AWS S3, MinIO, R2), or Git repositories
  • Merge precedence — Inline (highest) > S3 > Git (lowest) — override S3 tools locally without touching the bucket
  • Bearer and JWT authentication — built-in via FastMCP’s StaticTokenVerifier and JWTVerifier
  • Knowledge base support — serve Markdown files as MCP resources for RAG and context injection
  • Extra pip packages — install additional Python packages at startup before loading tools
  • Tool metadata — optional __tags__, __timeout__, __annotations_mcp__ module variables for tool categorization and behavior hints
  • Resource templates — parameterized URIs like users://{user_id}/profile for dynamic resources
  • Multiple resources per fileRESOURCES dict maps multiple URIs to handler functions
  • Error masking — hide internal error details from clients via MCP_MASK_ERROR_DETAILS
  • Duplicate handling — control behavior when tools share names via MCP_ON_DUPLICATE_TOOLS
  • Built-in Web UI — dashboard at /ui with tools/resources/prompts explorer (Alpine.js + Tailwind CDN)
  • Prometheus metrics — tool call counts, durations, errors, source sync status at /metrics
  • Structured JSON loggingLOG_FORMAT=json for Loki, ELK, CloudWatch, Datadog
  • Dedicated health endpoints/healthz (liveness), /readyz (readiness), /startupz (startup)
  • Diagnostic endpointGET /debug/info with full server introspection
  • Init container pattern — pre-sync sources before server starts via initSync.enabled
  • Strict loadingMCP_STRICT_LOADING=true fails on boot if any tool/resource has errors
  • Hot reload — automatic tool/resource reload on filesystem changes via MCP_HOT_RELOAD=true
  • Periodic sync — poll S3/Git sources for changes at configurable intervals
  • Webhook reloadPOST /reload endpoint for CI/CD-triggered reloads
  • OCI artifact source — pull tool bundles from OCI registries via ORAS
  • Selective sync — include/exclude glob patterns for source filtering
  • Gateway mode — compose multiple MCP servers via MCP_MODE=gateway and MCP_MOUNT_SERVERS
  • Tag visibility — enable/disable tools by tags with MCP_ENABLE_TAGS and MCP_DISABLE_TAGS
  • Multi-auth — combine bearer + JWT providers via MCP_AUTH_PROVIDERS
  • Tool-level scopes__required_scopes__ module variable for authorization
  • Context integration — tools can use ctx: Context for progress, logging, sampling, elicitation, and session state
  • Rate limiting__rate_limit__ module variable or MCP_RATE_LIMIT_DEFAULT env var (sliding window)
  • Caching__cache_ttl__ module variable for idempotent tool result caching
  • Tool sandboxing__max_memory_mb__ and __max_output_size_kb__ resource limits per tool
  • PodDisruptionBudgetpdb.enabled for zero-downtime rolling updates
  • HorizontalPodAutoscalerautoscaling.enabled for auto-scaling based on CPU/memory
  • Security — Trivy vulnerability scan, CycloneDX SBOM, Cosign keyless signing, SLSA provenance

Installation

HTTPS Repository

helm repo add helmforge https://repo.helmforge.dev
helm repo update
helm install fastmcp-server helmforge/fastmcp-server

OCI Registry

helm install fastmcp-server oci://ghcr.io/helmforgedev/helm/fastmcp-server

Basic Example — Inline Tools

# values.yaml
sources:
  inline:
    tools:
      greet.py: |
        def greet(name: str) -> str:
            """Greet someone by name."""
            return f"Hello, {name}!"
      math_ops.py: |
        def add(a: float, b: float) -> float:
            """Add two numbers."""
            return a + b
        def multiply(a: float, b: float) -> float:
            """Multiply two numbers."""
            return a * b
    knowledge:
      overview.md: |
        # Product Overview
        This document provides context for the AI assistant.

S3 Source (MinIO, AWS S3, Cloudflare R2)

sources:
  s3:
    enabled: true
    endpoint: 'https://minio.example.com'
    bucket: mcp-tools
    region: us-east-1
    prefix: production
    accessKey: minioadmin
    secretKey: minioadmin

Git Source

sources:
  git:
    enabled: true
    repository: 'https://github.com/your-org/mcp-tools.git'
    branch: main
    path: '' # optional subdirectory
    token: ghp_xxx # for private repos

Authentication

Bearer Token

auth:
  type: bearer
  bearer:
    token: my-secret-token
    # or use an existing Kubernetes secret:
    # existingSecret: my-auth-secret
    # existingSecretKey: token

JWT

auth:
  type: jwt
  jwt:
    issuer: 'https://auth.example.com'
    audience: 'mcp-server'
    jwksUri: 'https://auth.example.com/.well-known/jwks.json'

Observability

Prometheus Metrics

metrics:
  enabled: true
  serviceMonitor:
    enabled: true # requires Prometheus Operator
    interval: 30s

Metrics exposed at /metrics: tool call counts, durations, errors, source sync status, auth attempts.

Structured Logging

server:
  logFormat: json # JSON output for log aggregation

Health Endpoints

EndpointTypeWhen 200
/healthzLivenessAlways (process running)
/readyzReadinessSources synced + components loaded
/startupzStartupFull initialization complete

Diagnostics

GET /debug/info returns server version, FastMCP version, uptime, registered components, source status, auth type, and configuration.

Rate Limiting

rateLimiting:
  default: '100/min' # global default
  perTool:
    DEPLOY: '5/min' # per-tool override
    DELETE_DATA: '2/min'

Or via module-level variable:

__rate_limit__ = "5/min"

def deploy(service: str, version: str) -> str:
    """Deploy a service (rate limited)."""
    return f"Deployed {service}@{version}"

Caching

__cache_ttl__ = 300  # cache results for 5 minutes

def get_exchange_rate(currency: str) -> float:
    """Get current exchange rate (cached 5min)."""
    ...
caching:
  enabled: true # default
  maxSize: 1000 # max entries per tool

Tool Sandboxing

__max_memory_mb__ = 256
__max_output_size_kb__ = 100

def process_data(data: str) -> str:
    """Process data with resource limits."""
    ...

Autoscaling

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80

pdb:
  enabled: true
  minAvailable: 1

Key Values

KeyDefaultDescription
image.repositorydocker.io/helmforge/fastmcp-serverContainer image
image.tag1.0.0Image tag
server.namefastmcp-serverServer name in MCP responses
server.port8000HTTP port
server.path/mcpMCP endpoint path
server.logFormattextLog format: text or json
server.strictLoadingfalseFail on boot if component errors
ui.enabledtrueEnable Web UI at /ui
metrics.enabledfalseEnable Prometheus metrics at /metrics
auth.typenoneAuthentication: none, bearer, jwt
rateLimiting.default""Default rate limit (e.g., 100/min)
caching.enabledtrueEnable result caching
sandboxing.maxMemoryMb0Default max memory per tool (MB)
sandboxing.maxOutputSizeKb0Default max output per tool (KB)
sources.inline.tools{}Inline Python tool files
sources.inline.knowledge{}Inline knowledge base files
sources.s3.enabledfalseEnable S3 source
sources.s3.bucket""S3 bucket name
sources.git.enabledfalseEnable Git source
sources.git.repository""Git repository HTTPS URL
extraPipPackages[]Extra pip packages to install at startup
initSync.enabledfalseRun source sync as init container
persistence.enabledfalseEnable persistent workspace volume
autoscaling.enabledfalseEnable HPA
pdb.enabledfalseEnable PodDisruptionBudget
ingress.enabledfalseEnable ingress

Operational Notes

  • Merge precedence is Inline > S3 > Git — if a tool with the same filename exists in multiple sources, the highest-precedence version wins
  • Knowledge base files are served as MCP resources at knowledge://{filename} URIs
  • Tools are Python files with top-level functions; resources need a RESOURCE_URI or RESOURCES module-level variable
  • Tools support optional metadata: __tags__ (set), __timeout__ (float), __annotations_mcp__ (dict)
  • Resource URIs can use {param} placeholders for dynamic templates
  • The extraPipPackages list installs before tools load — use it when tools import external libraries
  • The Web UI auto-refreshes every 15 seconds and requires no external dependencies
  • Init container pattern (initSync.enabled) separates source syncing from server startup for better Kubernetes readiness semantics

More Information