Troubleshooting

Symptom-based guide to diagnosing and resolving common issues with HelmForge charts.

Pod stays in CrashLoopBackOff

Symptoms: Pod restarts repeatedly, kubectl get pods shows CrashLoopBackOff status.

Diagnosis:

# Check pod logs
kubectl logs <pod-name> --previous

# Check pod events
kubectl describe pod <pod-name>

Common causes:

Cause	Fix
Missing or wrong database credentials	Check `auth.existingSecret` references and secret key names
Insufficient memory (OOMKilled)	Increase `resources.limits.memory`
Wrong image tag or missing image	Verify `image.tag` matches a valid published version
Config file syntax error	Check mounted ConfigMaps for YAML/JSON syntax
Dependency not ready	Ensure dependent services (database, Redis) are running first

If the pod log shows exec format error, you may be running an AMD64 image on an ARM node (or vice versa). Check the image supports your node architecture.

PVC stuck in Pending

Symptoms: kubectl get pvc shows Pending status, pods cannot start.

Diagnosis:

kubectl describe pvc <pvc-name>
kubectl get storageclass

Common causes:

Cause	Fix
No default StorageClass	Set a default: `kubectl patch sc <name> -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'`
StorageClass doesn’t exist	Create the StorageClass or change `persistence.storageClass` in values
Insufficient cluster storage	Free disk space or add nodes
WaitForFirstConsumer binding	The PVC binds when a pod is scheduled — check pod scheduling issues

Database connection refused

Symptoms: Application pods log connection refused or could not connect to server when trying to reach a database.

Diagnosis:

# Check if the database pod is running
kubectl get pods -l app.kubernetes.io/name=<chart-name>

# Check the service exists
kubectl get svc -l app.kubernetes.io/name=<chart-name>

# Test connectivity from within the cluster
kubectl run debug --rm -it --image=busybox -- sh
# then: nc -zv <service-name> <port>

Common causes:

Cause	Fix
Database pod not ready	Wait for readiness probe to pass, check logs for startup errors
Wrong service name	Use the full service name: `<release>-<chart>.<namespace>.svc.cluster.local`
Wrong port	Check `service.port` in the chart’s values
Network policy blocking traffic	Check NetworkPolicies in the namespace
Auth mismatch	Verify the application uses the same credentials as the database chart

Ingress returns 404 or 503

Symptoms: Ingress resource exists but the application returns 404 or 503 errors.

Diagnosis:

# Check ingress resource
kubectl describe ingress <ingress-name>

# Check ingress controller logs
kubectl logs -n <ingress-namespace> -l app.kubernetes.io/name=<controller>

# Verify backend service
kubectl get endpoints <service-name>

Common causes:

Cause	Fix
Wrong `ingressClassName`	Match the class to your installed controller (`traefik`, `nginx`, etc.)
No ingress controller installed	Install one: `helm install traefik traefik/traefik`
Service has no endpoints	Check if pods are running and passing readiness probes
Path mismatch	Verify `pathType` (`Prefix` vs `Exact`) matches your app’s routing
TLS secret missing	Create the TLS secret or configure cert-manager

Backup CronJob never runs

Symptoms: Backup is enabled but no backup jobs appear.

Diagnosis:

# Check CronJob exists
kubectl get cronjob -l app.kubernetes.io/name=<chart-name>

# Check CronJob schedule
kubectl describe cronjob <cronjob-name>

# Check for failed jobs
kubectl get jobs -l app.kubernetes.io/name=<chart-name>

Common causes:

Cause	Fix
`backup.enabled` not set to `true`	Set `backup.enabled: true` in values
Invalid cron schedule	Validate schedule syntax (5 fields, no seconds)
S3 credentials wrong	Test S3 connectivity manually with `aws s3 ls --endpoint-url`
Job deadline exceeded	Increase `backup.activeDeadlineSeconds`
Suspended CronJob	Check `spec.suspend` field — set to `false`

Backup jobs use the same ServiceAccount as the main pod. If you have restrictive PodSecurityPolicies or PodSecurityStandards, ensure the backup container is allowed to run.

Helm upgrade fails with conflict

Symptoms: helm upgrade fails with cannot patch or field is immutable errors.

Common causes:

Cause	Fix
Immutable field changed (e.g., StatefulSet `volumeClaimTemplates`)	Delete the StatefulSet with `--cascade=orphan` and re-run upgrade
Resource owned by another release	Check `meta.helm.sh/release-name` annotation
CRD version conflict	Manually update CRDs before upgrading

# For immutable StatefulSet fields:
kubectl delete statefulset <name> --cascade=orphan
helm upgrade my-release helmforge/<chart-name> -f values.yaml

Using --cascade=orphan keeps the pods running while deleting the StatefulSet controller. The upgrade will recreate the StatefulSet and adopt the existing pods.

Helm install times out

Symptoms: helm install --wait times out before pods are ready.

Diagnosis:

kubectl get pods -l app.kubernetes.io/instance=<release>
kubectl describe pod <pod-name>
kubectl get events --sort-by=.metadata.creationTimestamp

Common causes:

Cause	Fix
Image pull error	Check image name, tag, and pull secrets
Resource quota exceeded	Check namespace ResourceQuotas
Node scheduling issues	Check node taints, tolerations, and available resources
Slow startup (large DB init)	Increase `--timeout` flag: `helm install --wait --timeout 10m`

General debugging commands

# Overview of release status
helm status <release-name>

# See what values are in use
helm get values <release-name>

# See rendered templates
helm template <release-name> helmforge/<chart-name> -f values.yaml

# Diff before upgrading (requires helm-diff plugin)
helm diff upgrade <release-name> helmforge/<chart-name> -f values.yaml

# Check all resources for a release
kubectl get all -l app.kubernetes.io/instance=<release-name>

Still stuck? Open an issue on GitHub with your chart version, Kubernetes version, and the output of kubectl describe pod and helm get values.