Skip to content

Neo4jBackup

This document provides a reference for the Neo4jBackup Custom Resource Definition (CRD). This resource is used for creating and managing backups of Neo4j databases running under either Neo4jEnterpriseCluster or Neo4jEnterpriseStandalone.

For a comprehensive guide on using backups, see the Backup and Restore Guide.

API Version

  • Group: neo4j.neo4j.com
  • Version: v1beta1
  • Kind: Neo4jBackup

How it works

The operator creates a Kubernetes Job that runs neo4j-admin database backup inside a container using the same Neo4j enterprise image as the target cluster. No separate backup image is needed or configured.

Key implementation details:

  • The operator automatically sets server.backup.listen_address=0.0.0.0:6362 in neo4j.conf on the target StatefulSet.
  • The --from flag is automatically populated with the FQDNs of all server pods at port 6362.
  • For cloud storage, --to-path uses native cloud URIs: s3://, gs://, azb://.
  • For PVC storage, --to-path uses the local path within the mounted PVC.
  • RBAC: Only a neo4j-backup-sa ServiceAccount is created. No Role or RoleBinding is created because the backup Job requires no Kubernetes API access.
  • Cloud retention: The operator logs a notice to configure bucket lifecycle rules on the cloud provider side. PVC retention uses find + rm in a cleanup Job.

Spec

The Neo4jBackupSpec defines the desired state of a Neo4j backup configuration.

Field Type Required Description
target BackupTarget What to back up
storage StorageLocation Where to store the backup
schedule string Cron expression for automated backups (e.g., "0 2 * * *")
cloud *CloudBlock Top-level cloud provider configuration (used for workload identity)
retention *RetentionPolicy Backup retention policy
options *BackupOptions Backup-specific options
suspend bool Suspend the backup schedule without deleting the resource

Type Definitions

BackupTarget

Defines what to back up. The kind field controls how name and clusterRef are interpreted.

Field Type Required Description
kind string Type of resource to back up: "Cluster" or "Database"
name string When kind=Cluster: name of the Neo4jEnterpriseCluster or Neo4jEnterpriseStandalone. When kind=Database: name of the Neo4j database (e.g., "neo4j", "mydb")
clusterRef string ✅ when kind=Database Name of the Neo4jEnterpriseCluster or Neo4jEnterpriseStandalone that owns the database. Unused when kind=Cluster.
namespace string Namespace of the target resource (defaults to the backup namespace)

Important: In earlier releases, when kind=Database the name field was incorrectly used for cluster lookup. This has been corrected: name is always the database name and clusterRef is the cluster name. Both are required when kind=Database.

Examples:

# Back up an entire cluster (all databases)
target:
  kind: Cluster
  name: production-cluster

# Back up a single database
target:
  kind: Database
  name: mydb
  clusterRef: production-cluster
  namespace: neo4j

StorageLocation

Defines where to store backups.

Field Type Required Description
type string Storage type: "s3", "gcs", "azure", "pvc"
bucket string Bucket or container name (required for cloud storage types)
path string Path within the bucket or PVC
pvc *PVCSpec PVC configuration (required when type=pvc)
cloud *CloudBlock Cloud provider configuration including optional credentials secret

CloudBlock

Cloud provider configuration. This type appears both on StorageLocation (for per-storage credentials) and as a top-level spec.cloud field (for workload identity setup).

Field Type Required Description
provider string Cloud provider: "aws", "gcp", "azure"
credentialsSecretRef string Name of a Kubernetes Secret containing cloud provider credentials as environment variables. When absent, ambient workload identity (IRSA / GKE WI / Azure WI) is used instead.
identity *CloudIdentity Cloud identity configuration (for workload identity ServiceAccount annotations)
endpointURL string Override the S3 API endpoint. Use for S3-compatible stores such as MinIO, Ceph RGW, or Cloudflare R2 (e.g. "http://minio.minio.svc:9000"). Only applies when provider: aws.
forcePathStyle bool Force S3 path-style addressing (endpoint/bucket/key instead of bucket.endpoint/key). Required for MinIO and most self-hosted S3-compatible stores. Only effective when endpointURL is set.

Secret key requirements by provider (when credentialsSecretRef is set):

Provider Required secret keys Notes
AWS AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION Standard AWS SDK env vars
MinIO / S3-compatible AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION Same keys as AWS; set endpointURL and forcePathStyle: true on CloudBlock
GCS GOOGLE_APPLICATION_CREDENTIALS_JSON Full service-account key JSON as a string value — not a filename path
Azure AZURE_STORAGE_ACCOUNT, AZURE_STORAGE_KEY Storage account credentials

Example — creating cloud credential secrets:

# AWS
kubectl create secret generic aws-backup-credentials \
  --from-literal=AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE \
  --from-literal=AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
  --from-literal=AWS_REGION=us-east-1

# MinIO (uses the same keys; region value is arbitrary — MinIO ignores it)
kubectl create secret generic minio-backup-credentials \
  --from-literal=AWS_ACCESS_KEY_ID=minioadmin \
  --from-literal=AWS_SECRET_ACCESS_KEY=minioadmin \
  --from-literal=AWS_REGION=us-east-1

# GCS — pass the JSON content directly as a string value
kubectl create secret generic gcs-backup-credentials \
  --from-literal=GOOGLE_APPLICATION_CREDENTIALS_JSON="$(cat service-account.json)"

# Azure
kubectl create secret generic azure-backup-credentials \
  --from-literal=AZURE_STORAGE_ACCOUNT=myaccount \
  --from-literal=AZURE_STORAGE_KEY=base64key==

MinIO / S3-compatible example:

storage:
  type: s3
  bucket: neo4j-backups
  path: cluster/full
  cloud:
    provider: aws
    credentialsSecretRef: minio-backup-credentials
    endpointURL: http://minio.minio.svc:9000   # in-cluster MinIO service
    forcePathStyle: true                        # required for MinIO

How it works: endpointURL is injected as AWS_ENDPOINT_URL_S3 (AWS SDK v2 standard). forcePathStyle: true injects -Daws.s3.forcePathStyle=true via JAVA_TOOL_OPTIONS, which the neo4j-admin JVM process reads at startup.

CloudIdentity

Cloud identity configuration for workload identity scenarios (no static credentials).

Field Type Required Description
provider string Identity provider: "aws", "gcp", "azure"
serviceAccount string Name of an existing ServiceAccount to use. When absent and autoCreate.enabled=true, the operator creates neo4j-backup-sa.
autoCreate *AutoCreateSpec Auto-create ServiceAccount with workload-identity annotations

AutoCreateSpec

Controls automatic ServiceAccount creation with workload-identity annotations.

Field Type Required Description
enabled bool Enable auto-creation of the neo4j-backup-sa ServiceAccount (default: true)
annotations map[string]string Annotations applied to the neo4j-backup-sa ServiceAccount on every reconcile. Use this to attach workload-identity annotations.

The annotations in autoCreate.annotations are applied to the neo4j-backup-sa ServiceAccount on every reconcile, so they stay in sync with the desired state.

Workload identity annotation examples:

# AWS IRSA
autoCreate:
  enabled: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/neo4j-backup-role

# GKE Workload Identity
autoCreate:
  enabled: true
  annotations:
    iam.gke.io/gcp-service-account: neo4j-backup@my-project.iam.gserviceaccount.com

# Azure Workload Identity
autoCreate:
  enabled: true
  annotations:
    azure.workload.identity/client-id: 00000000-0000-0000-0000-000000000000

PVCSpec

PVC configuration for local storage.

Field Type Required Description
storageClassName string Storage class name for dynamic provisioning
name string Name of an existing PVC to use
size string Size for a new PVC (e.g., "100Gi")

RetentionPolicy

Backup retention configuration.

Field Type Required Description
maxAge string Maximum age of backups to retain (e.g., "30d", "4w")
maxCount int32 Maximum number of backups to retain
deletePolicy string Action for expired backups: "Delete" (default) or "Archive"

Cloud storage retention: For cloud storage targets the operator logs a notice to configure bucket lifecycle rules on the cloud provider side. Automated deletion of cloud objects is not performed by the operator.

BackupOptions

Fine-grained backup execution options.

Field Type Required Description
compress bool Compress the backup (default: true)
backupType string Backup type: "FULL", "DIFF", "AUTO" (default)
preferDiffAsParent bool Use the latest differential backup as the parent when creating a new differential backup (default: false). Maps to --prefer-diff-as-parent. Requires CalVer 2025.04+ — an error is returned at runtime if the target version does not support this flag.
tempPath string Local directory path for temporary files during backup. When tempStorage is configured, this is set automatically. Only set manually if you are mounting your own volume. Maps to --temp-path.
tempStorage *TempStorageSpec Provisions a PVC for temporary staging files during cloud backups. The operator mounts this PVC and passes --temp-path automatically. Recommended for large databases to avoid filling ephemeral disk.
pageCache string Page cache size hint (e.g., "4G"). Must match pattern ^[0-9]+[KMG]?$
encryption *EncryptionSpec Backup encryption configuration
verify bool Verify backup integrity after creation
parallelDownload bool Enable parallel download for remote backups
remoteAddressResolution bool Resolve remote addresses during backup
skipRecovery bool Skip the recovery step after backup
includeMetadata string Controls which metadata is included in the backup. Values: "all" (default), "none", "users", "roles". Requires Neo4j 5.26+.
parallelRecovery bool Enable multi-threaded transaction application during backup
keepFailed bool Preserve failed backup artifacts for debugging instead of deleting them
additionalArgs []string Additional arguments passed verbatim to neo4j-admin database backup

preferDiffAsParent version requirement: This flag was introduced in Neo4j CalVer 2025.04. Using it against Neo4j 5.26.x or CalVer 2025.01–2025.03 will cause the backup Job to fail with an unsupported argument error. The operator validates this at runtime and returns an error before creating the Job.

TempStorageSpec

Provisions temporary staging storage for cloud backup/restore. The operator creates a PVC, mounts it at /tmp/neo4j-staging in the Job pod, and passes --temp-path=/tmp/neo4j-staging to neo4j-admin. The PVC is owned by the Job and garbage-collected when the Job's TTL expires.

Field Type Required Description
size string Size of the temporary PVC (e.g., "50Gi"). Should be at least as large as the expected backup artifact. Must match pattern ^[0-9]+(Ki\|Mi\|Gi\|Ti)?$
storageClassName string StorageClass for the temporary PVC. If empty, uses the cluster default.

EncryptionSpec

Backup encryption configuration.

Field Type Required Description
enabled bool Enable backup encryption
keySecret string Name of a Kubernetes Secret containing the encryption key
keySecretKey string Key within the Secret containing the encryption key (default: "key")
algorithm string Encryption algorithm: "AES256" (default) or "ChaCha20Poly1305"

Status

The Neo4jBackupStatus represents the observed state of the backup.

Field Type Description
conditions []metav1.Condition Current backup conditions
phase string Current backup phase
message string Human-readable message about the current state
lastRunTime *metav1.Time When the last backup Job started
lastSuccessTime *metav1.Time When the last successful backup completed
nextRunTime *metav1.Time When the next scheduled backup will run
stats *BackupStats Statistics from the most recent backup run
history []BackupRun History of recent backup runs

BackupStats

Field Type Description
size string Total backup size (e.g., "2.5GB")
duration string Backup operation duration (e.g., "5m30s")
throughput string Backup throughput rate (e.g., "8.3MB/s")
fileCount int32 Number of files in the backup

BackupRun

Represents a single backup Job execution.

Field Type Description
startTime metav1.Time When the backup run started
completionTime *metav1.Time When the run completed (nil if still running)
status string Run status: "Running", "Succeeded", "Failed"
error string Error message if the backup failed
stats *BackupStats Backup statistics for this run

Examples

Scheduled S3 Backup (Cluster) with IRSA

Uses AWS IRSA workload identity — no static credentials needed.

apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: daily-cluster-backup
  namespace: neo4j
spec:
  target:
    kind: Cluster
    name: production-cluster
  storage:
    type: s3
    bucket: neo4j-backups
    path: daily/
    cloud:
      provider: aws
  cloud:
    provider: aws
    identity:
      provider: aws
      autoCreate:
        enabled: true
        annotations:
          eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/neo4j-backup-role
  schedule: "0 2 * * *"   # Daily at 2 AM UTC
  retention:
    maxAge: "30d"
    maxCount: 30
  options:
    compress: true
    backupType: FULL
    tempStorage:\
      size: "50Gi"
    encryption:
      enabled: true
      keySecret: backup-encryption-key

Scheduled S3 Backup with Static Credentials

Uses an explicit Kubernetes Secret for AWS credentials instead of IRSA.

apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: daily-cluster-backup-static-creds
  namespace: neo4j
spec:
  target:
    kind: Cluster
    name: production-cluster
  storage:
    type: s3
    bucket: neo4j-backups
    path: daily/
    cloud:
      provider: aws
      credentialsSecretRef: aws-backup-credentials   # Secret with AWS_ACCESS_KEY_ID etc.
  schedule: "0 2 * * *"
  retention:
    maxAge: "30d"
    maxCount: 30
  options:
    compress: true
    backupType: FULL
    tempStorage:\
      size: "50Gi"

Single-Database Backup to S3

Backs up only one database. Both name (database) and clusterRef (cluster) are required.

apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: mydb-daily-backup
  namespace: neo4j
spec:
  target:
    kind: Database
    name: mydb            # The Neo4j database name
    clusterRef: production-cluster   # The cluster that hosts the database
    namespace: neo4j
  storage:
    type: s3
    bucket: neo4j-backups
    path: mydb/daily/
    cloud:
      provider: aws
      credentialsSecretRef: aws-backup-credentials
  schedule: "0 3 * * *"
  options:
    compress: true
    backupType: AUTO
    tempStorage:\
      size: "50Gi"

Differential Backup with preferDiffAsParent (CalVer 2025.04+)

apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: hourly-diff-backup
  namespace: neo4j
spec:
  target:
    kind: Cluster
    name: production-cluster-2025
  storage:
    type: s3
    bucket: neo4j-backups
    path: hourly-diff/
    cloud:
      provider: aws
      credentialsSecretRef: aws-backup-credentials
  schedule: "0 * * * *"   # Every hour
  options:
    backupType: DIFF
    preferDiffAsParent: true   # Requires Neo4j CalVer 2025.04+
    tempStorage:\
      size: "50Gi"
    compress: true

On-Demand PVC Backup

apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: manual-pvc-backup
  namespace: neo4j
spec:
  target:
    kind: Database
    name: mydb
    clusterRef: staging-cluster
    namespace: neo4j
  storage:
    type: pvc
    pvc:
      name: backup-storage
    path: backups/manual/
  options:
    compress: true
    verify: true
    backupType: DIFF

GCS Backup with GKE Workload Identity

apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: weekly-gcs-backup
  namespace: neo4j
spec:
  target:
    kind: Cluster
    name: analytics-cluster
  storage:
    type: gcs
    bucket: neo4j-analytics-backups
    path: weekly/
    cloud:
      provider: gcp
  cloud:
    provider: gcp
    identity:
      provider: gcp
      autoCreate:
        enabled: true
        annotations:
          iam.gke.io/gcp-service-account: neo4j-backup@my-project.iam.gserviceaccount.com
  schedule: "0 3 * * 0"   # Weekly on Sunday at 3 AM
  retention:
    maxCount: 12
    deletePolicy: Archive
  options:
    backupType: AUTO
    pageCache: "8G"
    tempStorage:\
      size: "50Gi"

GCS Backup with Static Service Account Credentials

apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: weekly-gcs-backup-static
  namespace: neo4j
spec:
  target:
    kind: Cluster
    name: analytics-cluster
  storage:
    type: gcs
    bucket: neo4j-analytics-backups
    path: weekly/
    cloud:
      provider: gcp
      credentialsSecretRef: gcs-backup-credentials   # Secret with GOOGLE_APPLICATION_CREDENTIALS_JSON
  schedule: "0 3 * * 0"
  retention:
    maxCount: 12
  options:
    backupType: AUTO
    pageCache: "8G"
    tempStorage:\
      size: "50Gi"

Azure Backup with Azure Workload Identity

apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: daily-azure-backup
  namespace: neo4j
spec:
  target:
    kind: Cluster
    name: enterprise-cluster
  storage:
    type: azure
    bucket: neo4j-backups         # Azure storage container name
    path: daily/
    cloud:
      provider: azure
  cloud:
    provider: azure
    identity:
      provider: azure
      autoCreate:
        enabled: true
        annotations:
          azure.workload.identity/client-id: 00000000-0000-0000-0000-000000000000
  schedule: "0 1 * * *"
  retention:
    maxAge: "14d"
  options:
    compress: true
    backupType: FULL
    tempStorage:\
      size: "50Gi"

Azure Backup with Static Credentials

apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: daily-azure-backup-static
  namespace: neo4j
spec:
  target:
    kind: Cluster
    name: enterprise-cluster
  storage:
    type: azure
    bucket: neo4j-backups
    path: daily/
    cloud:
      provider: azure
      credentialsSecretRef: azure-backup-credentials   # Secret with AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_KEY
  schedule: "0 1 * * *"
  retention:
    maxAge: "14d"
  options:
    compress: true
    backupType: FULL
    tempStorage:\
      size: "50Gi"

Monitoring

# List all backup resources
kubectl get neo4jbackup -n neo4j

# View backup status and last run time
kubectl get neo4jbackup daily-cluster-backup -o wide

# Describe a backup for detailed status and events
kubectl describe neo4jbackup daily-cluster-backup

# Watch backup status changes
kubectl get neo4jbackup daily-cluster-backup -w

# Check logs from the most recent backup Job
kubectl logs -n neo4j -l neo4j.com/backup=daily-cluster-backup --tail=100

# Check backup phase
kubectl get neo4jbackup daily-cluster-backup -o jsonpath='{.status.phase}'

# Check last success time
kubectl get neo4jbackup daily-cluster-backup -o jsonpath='{.status.lastSuccessTime}'

For more information on backup operations, see the Backup and Restore Guide.