Skip to content

Troubleshooting

Symptom-based guide to diagnosing and resolving common issues with HelmForge charts.


Pod stays in CrashLoopBackOff

Symptoms: Pod restarts repeatedly, kubectl get pods shows CrashLoopBackOff status.

Diagnosis:

# Check pod logs
kubectl logs <pod-name> --previous

# Check pod events
kubectl describe pod <pod-name>

Common causes:

Cause Fix
Missing or wrong database credentials Check auth.existingSecret references and secret key names
Insufficient memory (OOMKilled) Increase resources.limits.memory
Wrong image tag or missing image Verify image.tag matches a valid published version
Config file syntax error Check mounted ConfigMaps for YAML/JSON syntax
Dependency not ready Ensure dependent services (database, Redis) are running first

If the pod log shows exec format error, you may be running an AMD64 image on an ARM node (or vice versa). Check the image supports your node architecture.


PVC stuck in Pending

Symptoms: kubectl get pvc shows Pending status, pods cannot start.

Diagnosis:

kubectl describe pvc <pvc-name>
kubectl get storageclass

Common causes:

Cause Fix
No default StorageClass Set a default: kubectl patch sc <name> -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
StorageClass doesn’t exist Create the StorageClass or change persistence.storageClass in values
Insufficient cluster storage Free disk space or add nodes
WaitForFirstConsumer binding The PVC binds when a pod is scheduled — check pod scheduling issues

Database connection refused

Symptoms: Application pods log connection refused or could not connect to server when trying to reach a database.

Diagnosis:

# Check if the database pod is running
kubectl get pods -l app.kubernetes.io/name=<chart-name>

# Check the service exists
kubectl get svc -l app.kubernetes.io/name=<chart-name>

# Test connectivity from within the cluster
kubectl run debug --rm -it --image=busybox -- sh
# then: nc -zv <service-name> <port>

Common causes:

Cause Fix
Database pod not ready Wait for readiness probe to pass, check logs for startup errors
Wrong service name Use the full service name: <release>-<chart>.<namespace>.svc.cluster.local
Wrong port Check service.port in the chart’s values
Network policy blocking traffic Check NetworkPolicies in the namespace
Auth mismatch Verify the application uses the same credentials as the database chart

Ingress returns 404 or 503

Symptoms: Ingress resource exists but the application returns 404 or 503 errors.

Diagnosis:

# Check ingress resource
kubectl describe ingress <ingress-name>

# Check ingress controller logs
kubectl logs -n <ingress-namespace> -l app.kubernetes.io/name=<controller>

# Verify backend service
kubectl get endpoints <service-name>

Common causes:

Cause Fix
Wrong ingressClassName Match the class to your installed controller (traefik, nginx, etc.)
No ingress controller installed Install one: helm install traefik traefik/traefik
Service has no endpoints Check if pods are running and passing readiness probes
Path mismatch Verify pathType (Prefix vs Exact) matches your app’s routing
TLS secret missing Create the TLS secret or configure cert-manager

Backup CronJob never runs

Symptoms: Backup is enabled but no backup jobs appear.

Diagnosis:

# Check CronJob exists
kubectl get cronjob -l app.kubernetes.io/name=<chart-name>

# Check CronJob schedule
kubectl describe cronjob <cronjob-name>

# Check for failed jobs
kubectl get jobs -l app.kubernetes.io/name=<chart-name>

Common causes:

Cause Fix
backup.enabled not set to true Set backup.enabled: true in values
Invalid cron schedule Validate schedule syntax (5 fields, no seconds)
S3 credentials wrong Test S3 connectivity manually with aws s3 ls --endpoint-url
Job deadline exceeded Increase backup.activeDeadlineSeconds
Suspended CronJob Check spec.suspend field — set to false

Backup jobs use the same ServiceAccount as the main pod. If you have restrictive PodSecurityPolicies or PodSecurityStandards, ensure the backup container is allowed to run.


Helm upgrade fails with conflict

Symptoms: helm upgrade fails with cannot patch or field is immutable errors.

Common causes:

Cause Fix
Immutable field changed (e.g., StatefulSet volumeClaimTemplates) Delete the StatefulSet with --cascade=orphan and re-run upgrade
Resource owned by another release Check meta.helm.sh/release-name annotation
CRD version conflict Manually update CRDs before upgrading
# For immutable StatefulSet fields:
kubectl delete statefulset <name> --cascade=orphan
helm upgrade my-release helmforge/<chart-name> -f values.yaml

Using --cascade=orphan keeps the pods running while deleting the StatefulSet controller. The upgrade will recreate the StatefulSet and adopt the existing pods.


Helm install times out

Symptoms: helm install --wait times out before pods are ready.

Diagnosis:

kubectl get pods -l app.kubernetes.io/instance=<release>
kubectl describe pod <pod-name>
kubectl get events --sort-by=.metadata.creationTimestamp

Common causes:

Cause Fix
Image pull error Check image name, tag, and pull secrets
Resource quota exceeded Check namespace ResourceQuotas
Node scheduling issues Check node taints, tolerations, and available resources
Slow startup (large DB init) Increase --timeout flag: helm install --wait --timeout 10m

General debugging commands

# Overview of release status
helm status <release-name>

# See what values are in use
helm get values <release-name>

# See rendered templates
helm template <release-name> helmforge/<chart-name> -f values.yaml

# Diff before upgrading (requires helm-diff plugin)
helm diff upgrade <release-name> helmforge/<chart-name> -f values.yaml

# Check all resources for a release
kubectl get all -l app.kubernetes.io/instance=<release-name>

Still stuck? Open an issue on GitHub with your chart version, Kubernetes version, and the output of kubectl describe pod and helm get values.