ArchiveBox
Deploy ArchiveBox on Kubernetes — a self-hosted web archiving platform that captures websites in multiple formats (HTML, PDF, PNG, WARC, media) using Chromium headless rendering.
Key Features
- Multi-format archiving — HTML, PDF, screenshot, WARC, media extraction, git clone
- Chromium headless — full browser rendering with /dev/shm memory-backed tmpfs
- SQLite database — embedded storage, no external database needed
- Persistent storage — PVC for archived content and database
- Admin credentials — managed via Kubernetes Secret with auto-generation
- Ingress support — TLS via cert-manager with configurable ingress class
Installation
HTTPS repository:
helm repo add helmforge https://repo.helmforge.dev
helm repo update
helm install archivebox helmforge/archivebox -f values.yaml
OCI registry:
helm install archivebox oci://ghcr.io/helmforgedev/helm/archivebox -f values.yaml
Basic Example
archivebox:
adminUsername: admin
adminPassword: 'my-secure-password'
persistence:
enabled: true
size: 100Gi
ingress:
enabled: true
ingressClassName: traefik
hosts:
- host: archive.example.com
paths:
- path: /
pathType: Prefix
Key Values
| Key | Default | Description |
|---|---|---|
archivebox.port | 8000 | Application port |
archivebox.adminUsername | admin | Admin username |
archivebox.adminPassword | "" | Auto-generated if empty |
archivebox.searchBackendEngine | ripgrep | Search engine (ripgrep, sqlite, sonic) |
archivebox.mediaMaxSize | 750m | Max media download size |
archivebox.timeout | 60 | URL archiving timeout in seconds |
persistence.enabled | true | Enable /data persistence |
persistence.size | 50Gi | PVC size for archives |
ingress.enabled | false | Enable ingress |
Operational Notes
- single instance only — SQLite is single-writer, no horizontal scaling
- storage-heavy deployment — plan 50–100GB+ PVC depending on archive volume
- requires minimum 2Gi RAM for Chromium headless rendering
/dev/shmtmpfs is mounted automatically for Chromium stability
More Information
See the source code and full values reference on GitHub.