Skip to content

ArchiveBox

Deploy ArchiveBox on Kubernetes — a self-hosted web archiving platform that captures websites in multiple formats (HTML, PDF, PNG, WARC, media) using Chromium headless rendering.

Key Features

  • Multi-format archiving — HTML, PDF, screenshot, WARC, media extraction, git clone
  • Chromium headless — full browser rendering with /dev/shm memory-backed tmpfs
  • SQLite database — embedded storage, no external database needed
  • Persistent storage — PVC for archived content and database
  • Admin credentials — managed via Kubernetes Secret with auto-generation
  • Ingress support — TLS via cert-manager with configurable ingress class

Installation

HTTPS repository:

helm repo add helmforge https://repo.helmforge.dev
helm repo update
helm install archivebox helmforge/archivebox -f values.yaml

OCI registry:

helm install archivebox oci://ghcr.io/helmforgedev/helm/archivebox -f values.yaml

Basic Example

archivebox:
  adminUsername: admin
  adminPassword: 'my-secure-password'

persistence:
  enabled: true
  size: 100Gi

ingress:
  enabled: true
  ingressClassName: traefik
  hosts:
    - host: archive.example.com
      paths:
        - path: /
          pathType: Prefix

Key Values

KeyDefaultDescription
archivebox.port8000Application port
archivebox.adminUsernameadminAdmin username
archivebox.adminPassword""Auto-generated if empty
archivebox.searchBackendEngineripgrepSearch engine (ripgrep, sqlite, sonic)
archivebox.mediaMaxSize750mMax media download size
archivebox.timeout60URL archiving timeout in seconds
persistence.enabledtrueEnable /data persistence
persistence.size50GiPVC size for archives
ingress.enabledfalseEnable ingress

Operational Notes

  • single instance only — SQLite is single-writer, no horizontal scaling
  • storage-heavy deployment — plan 50–100GB+ PVC depending on archive volume
  • requires minimum 2Gi RAM for Chromium headless rendering
  • /dev/shm tmpfs is mounted automatically for Chromium stability

More Information

See the source code and full values reference on GitHub.