Skip to content

ZooKeeper

Apache ZooKeeper provides distributed coordination for cloud applications. The HelmForge chart deploys the official ZooKeeper image as a stable StatefulSet ensemble with quorum-safe defaults.

Key Features

  • Official docker.io/library/zookeeper image pinned to 3.9.5
  • Three-node replicated ensemble by default
  • Validation against accidental even replica counts
  • Client, headless, secure client, and metrics Services
  • Optional SASL/Digest client authentication
  • Optional secure client port using existing JKS keystore and truststore Secrets
  • Prometheus metrics provider, ServiceMonitor, PrometheusRule, NetworkPolicy, PDB, External Secrets, and dual-stack Services

Installation

helm repo add helmforge https://repo.helmforge.dev
helm repo update
helm install zookeeper helmforge/zookeeper --namespace zookeeper --create-namespace
helm install zookeeper oci://ghcr.io/helmforgedev/helm/zookeeper --namespace zookeeper --create-namespace

Examples

Standalone local install:

replicaCount: 1
persistence:
  enabled: false

Production ensemble:

replicaCount: 3
persistence:
  enabled: true
  size: 20Gi
podDisruptionBudget:
  enabled: true
  maxUnavailable: 1
metrics:
  enabled: true
  serviceMonitor:
    enabled: true

Operations

Keep production replica counts odd. Use allowEvenReplicas=true only for a deliberate platform-specific reason. Enable NetworkPolicy and explicitly allow client, quorum, DNS, and metrics flows.

Architecture

ZooKeeper is deployed as a StatefulSet with stable pod DNS, a client Service, a headless Service for quorum traffic, and optional metrics exposure. Production ensembles should use an odd replica count so quorum can survive a member failure.

Ports and roles:

  • client port for application connections
  • quorum election and follower communication ports between pods
  • optional secure client port when TLS is enabled
  • optional metrics port for Prometheus scraping

The chart blocks accidental even replica counts by default. Set allowEvenReplicas=true only when an operator has a clear reason and accepts the quorum tradeoff.

Production Values

Use three replicas, persistent data, a data log volume, PDB, metrics, topology spread, and NetworkPolicy:

replicaCount: 3

persistence:
  enabled: true
  size: 20Gi
  dataLogDir:
    enabled: true
    size: 10Gi

resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    memory: 2Gi

podDisruptionBudget:
  enabled: true
  maxUnavailable: 1

networkPolicy:
  enabled: true

metrics:
  enabled: true
  serviceMonitor:
    enabled: true
  prometheusRule:
    enabled: true

For local or CI smoke tests, standalone mode is intentionally simple:

replicaCount: 1

persistence:
  enabled: false

Authentication

Client SASL/Digest authentication is optional:

auth:
  client:
    enabled: true
    existingSecret: zookeeper-client-auth
    usernameKey: username
    passwordKey: password

When clients use authentication, update every application connection string and client JAAS configuration before enforcing the authenticated path.

TLS

Secure client port support expects existing JKS keystore and truststore material:

tls:
  client:
    enabled: true
    existingSecret: zookeeper-client-tls
    keystoreKey: keystore.jks
    truststoreKey: truststore.jks
    existingPasswordsSecret: zookeeper-client-tls-passwords

TLS changes affect both server startup and client compatibility. Validate the exact client libraries used by Kafka, Solr, or other ZooKeeper consumers before rollout.

External Secrets

External Secrets Operator can reconcile auth and TLS material when the operator already exists:

externalSecrets:
  enabled: true
  secretStoreRef:
    name: cluster-secrets
    kind: ClusterSecretStore
  data:
    - secretKey: password
      remoteRef:
        key: zookeeper/client
        property: password

The chart renders ExternalSecret resources only when explicitly enabled; it does not install External Secrets Operator or create a SecretStore.

Networking

NetworkPolicy must allow:

  • client traffic from approved application namespaces
  • quorum traffic between ZooKeeper pods
  • DNS egress
  • metrics scraping from the monitoring namespace when metrics are enabled

Dual-stack Service fields are available:

service:
  ipFamilyPolicy: PreferDualStack
  ipFamilies:
    - IPv4
    - IPv6

Observability

Enable metrics, ServiceMonitor, and PrometheusRule together when Prometheus Operator is available:

metrics:
  enabled: true
  serviceMonitor:
    enabled: true
    additionalLabels:
      release: prometheus
  prometheusRule:
    enabled: true

Watch quorum health, outstanding requests, latency, watches, open file descriptors, leader changes, and pod restarts.

Validation

After deployment:

helm test zookeeper -n zookeeper
kubectl get pods -n zookeeper -l app.kubernetes.io/name=zookeeper
kubectl logs -n zookeeper statefulset/zookeeper --since=10m
kubectl get events -n zookeeper --sort-by=.lastTimestamp

For production, validate a real client connection, quorum after a pod restart, and behavior during voluntary disruption with the PDB enabled.

Common Issues

SymptomLikely CauseFix
Render blocks even replicasQuorum safety validationUse an odd replica count or deliberately set allowEvenReplicas=true.
Ensemble never forms quorumPod DNS, NetworkPolicy, or quorum ports blockedCheck headless Service DNS and intra-ensemble policy.
Clients fail after enabling authClient JAAS/config not updatedRoll client configuration before enforcing auth.
TLS startup failsJKS Secret keys or passwords mismatchVerify Secret keys and password Secret values.

Values

ParameterDefaultDescription
replicaCount3ZooKeeper ensemble size.
allowEvenReplicasfalseAllow even replica counts.
image.repositorydocker.io/library/zookeeperOfficial ZooKeeper image.
zookeeper.clientPort2181Plain client port.
auth.client.enabledfalseEnable SASL/Digest client authentication.
tls.client.enabledfalseEnable secure client port with existing JKS material.
persistence.enabledtruePersist ZooKeeper data.
metrics.enabledfalseEnable Prometheus metrics provider.
podDisruptionBudget.enabledtrueRender PDB for ensemble availability.
externalSecrets.enabledfalseRender ExternalSecret resources.