Troubleshooting
Symptom-based guide to diagnosing and resolving common issues with HelmForge charts.
Pod stays in CrashLoopBackOff
Symptoms: Pod restarts repeatedly, kubectl get pods shows CrashLoopBackOff status.
Diagnosis:
# Check pod logs
kubectl logs <pod-name> --previous
# Check pod events
kubectl describe pod <pod-name>
Common causes:
| Cause | Fix |
|---|---|
| Missing or wrong database credentials | Check auth.existingSecret references and secret key names |
| Insufficient memory (OOMKilled) | Increase resources.limits.memory |
| Wrong image tag or missing image | Verify image.tag matches a valid published version |
| Config file syntax error | Check mounted ConfigMaps for YAML/JSON syntax |
| Dependency not ready | Ensure dependent services (database, Redis) are running first |
If the pod log shows exec format error, you may be running an AMD64 image on an ARM node (or vice versa). Check the
image supports your node architecture.
PVC stuck in Pending
Symptoms: kubectl get pvc shows Pending status, pods cannot start.
Diagnosis:
kubectl describe pvc <pvc-name>
kubectl get storageclass
Common causes:
| Cause | Fix |
|---|---|
| No default StorageClass | Set a default: kubectl patch sc <name> -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' |
| StorageClass doesn’t exist | Create the StorageClass or change persistence.storageClass in values |
| Insufficient cluster storage | Free disk space or add nodes |
| WaitForFirstConsumer binding | The PVC binds when a pod is scheduled — check pod scheduling issues |
Database connection refused
Symptoms: Application pods log connection refused or could not connect to server when trying to reach a database.
Diagnosis:
# Check if the database pod is running
kubectl get pods -l app.kubernetes.io/name=<chart-name>
# Check the service exists
kubectl get svc -l app.kubernetes.io/name=<chart-name>
# Test connectivity from within the cluster
kubectl run debug --rm -it --image=busybox -- sh
# then: nc -zv <service-name> <port>
Common causes:
| Cause | Fix |
|---|---|
| Database pod not ready | Wait for readiness probe to pass, check logs for startup errors |
| Wrong service name | Use the full service name: <release>-<chart>.<namespace>.svc.cluster.local |
| Wrong port | Check service.port in the chart’s values |
| Network policy blocking traffic | Check NetworkPolicies in the namespace |
| Auth mismatch | Verify the application uses the same credentials as the database chart |
Ingress returns 404 or 503
Symptoms: Ingress resource exists but the application returns 404 or 503 errors.
Diagnosis:
# Check ingress resource
kubectl describe ingress <ingress-name>
# Check ingress controller logs
kubectl logs -n <ingress-namespace> -l app.kubernetes.io/name=<controller>
# Verify backend service
kubectl get endpoints <service-name>
Common causes:
| Cause | Fix |
|---|---|
Wrong ingressClassName | Match the class to your installed controller (traefik, nginx, etc.) |
| No ingress controller installed | Install one: helm install traefik traefik/traefik |
| Service has no endpoints | Check if pods are running and passing readiness probes |
| Path mismatch | Verify pathType (Prefix vs Exact) matches your app’s routing |
| TLS secret missing | Create the TLS secret or configure cert-manager |
Backup CronJob never runs
Symptoms: Backup is enabled but no backup jobs appear.
Diagnosis:
# Check CronJob exists
kubectl get cronjob -l app.kubernetes.io/name=<chart-name>
# Check CronJob schedule
kubectl describe cronjob <cronjob-name>
# Check for failed jobs
kubectl get jobs -l app.kubernetes.io/name=<chart-name>
Common causes:
| Cause | Fix |
|---|---|
backup.enabled not set to true | Set backup.enabled: true in values |
| Invalid cron schedule | Validate schedule syntax (5 fields, no seconds) |
| S3 credentials wrong | Test S3 connectivity manually with aws s3 ls --endpoint-url |
| Job deadline exceeded | Increase backup.activeDeadlineSeconds |
| Suspended CronJob | Check spec.suspend field — set to false |
Backup jobs use the same ServiceAccount as the main pod. If you have restrictive PodSecurityPolicies or PodSecurityStandards, ensure the backup container is allowed to run.
Helm upgrade fails with conflict
Symptoms: helm upgrade fails with cannot patch or field is immutable errors.
Common causes:
| Cause | Fix |
|---|---|
Immutable field changed (e.g., StatefulSet volumeClaimTemplates) | Delete the StatefulSet with --cascade=orphan and re-run upgrade |
| Resource owned by another release | Check meta.helm.sh/release-name annotation |
| CRD version conflict | Manually update CRDs before upgrading |
# For immutable StatefulSet fields:
kubectl delete statefulset <name> --cascade=orphan
helm upgrade my-release helmforge/<chart-name> -f values.yaml
Using --cascade=orphan keeps the pods running while deleting the StatefulSet controller. The upgrade will recreate
the StatefulSet and adopt the existing pods.
Helm install times out
Symptoms: helm install --wait times out before pods are ready.
Diagnosis:
kubectl get pods -l app.kubernetes.io/instance=<release>
kubectl describe pod <pod-name>
kubectl get events --sort-by=.metadata.creationTimestamp
Common causes:
| Cause | Fix |
|---|---|
| Image pull error | Check image name, tag, and pull secrets |
| Resource quota exceeded | Check namespace ResourceQuotas |
| Node scheduling issues | Check node taints, tolerations, and available resources |
| Slow startup (large DB init) | Increase --timeout flag: helm install --wait --timeout 10m |
General debugging commands
# Overview of release status
helm status <release-name>
# See what values are in use
helm get values <release-name>
# See rendered templates
helm template <release-name> helmforge/<chart-name> -f values.yaml
# Diff before upgrading (requires helm-diff plugin)
helm diff upgrade <release-name> helmforge/<chart-name> -f values.yaml
# Check all resources for a release
kubectl get all -l app.kubernetes.io/instance=<release-name>
Still stuck? Open an issue on GitHub with your chart version,
Kubernetes version, and the output of kubectl describe pod and helm get values.