Neo4jBackup¶
This document provides a reference for the Neo4jBackup Custom Resource Definition (CRD). This resource is used for creating and managing backups of Neo4j databases running under either Neo4jEnterpriseCluster or Neo4jEnterpriseStandalone.
For a comprehensive guide on using backups, see the Backup and Restore Guide.
API Version¶
- Group:
neo4j.neo4j.com - Version:
v1beta1 - Kind:
Neo4jBackup
How it works¶
The operator creates a Kubernetes Job that runs neo4j-admin database backup inside a container using the same Neo4j enterprise image as the target cluster. No separate backup image is needed or configured.
Key implementation details:
- The operator automatically sets
server.backup.listen_address=0.0.0.0:6362inneo4j.confon the target StatefulSet. - The
--fromflag is automatically populated with the FQDNs of all server pods at port6362. - For cloud storage,
--to-pathuses native cloud URIs:s3://,gs://,azb://. - For PVC storage,
--to-pathuses the local path within the mounted PVC. - RBAC: Only a
neo4j-backup-saServiceAccount is created. No Role or RoleBinding is created because the backup Job requires no Kubernetes API access. - Cloud retention: The operator logs a notice to configure bucket lifecycle rules on the cloud provider side. PVC retention uses
find+rmin a cleanup Job.
Spec¶
The Neo4jBackupSpec defines the desired state of a Neo4j backup configuration.
| Field | Type | Required | Description |
|---|---|---|---|
target |
BackupTarget |
✅ | What to back up |
storage |
StorageLocation |
✅ | Where to store the backup |
schedule |
string |
❌ | Cron expression for automated backups (e.g., "0 2 * * *") |
cloud |
*CloudBlock |
❌ | Top-level cloud provider configuration (used for workload identity) |
retention |
*RetentionPolicy |
❌ | Backup retention policy |
options |
*BackupOptions |
❌ | Backup-specific options |
suspend |
bool |
❌ | Suspend the backup schedule without deleting the resource |
Type Definitions¶
BackupTarget¶
Defines what to back up. The kind field controls how name and clusterRef are interpreted.
| Field | Type | Required | Description |
|---|---|---|---|
kind |
string |
✅ | Type of resource to back up: "Cluster" or "Database" |
name |
string |
✅ | When kind=Cluster: name of the Neo4jEnterpriseCluster or Neo4jEnterpriseStandalone. When kind=Database: name of the Neo4j database (e.g., "neo4j", "mydb") |
clusterRef |
string |
✅ when kind=Database |
Name of the Neo4jEnterpriseCluster or Neo4jEnterpriseStandalone that owns the database. Unused when kind=Cluster. |
namespace |
string |
❌ | Namespace of the target resource (defaults to the backup namespace) |
Important: In earlier releases, when
kind=Databasethenamefield was incorrectly used for cluster lookup. This has been corrected:nameis always the database name andclusterRefis the cluster name. Both are required whenkind=Database.
Examples:
# Back up an entire cluster (all databases)
target:
kind: Cluster
name: production-cluster
# Back up a single database
target:
kind: Database
name: mydb
clusterRef: production-cluster
namespace: neo4j
StorageLocation¶
Defines where to store backups.
| Field | Type | Required | Description |
|---|---|---|---|
type |
string |
✅ | Storage type: "s3", "gcs", "azure", "pvc" |
bucket |
string |
❌ | Bucket or container name (required for cloud storage types) |
path |
string |
❌ | Path within the bucket or PVC |
pvc |
*PVCSpec |
❌ | PVC configuration (required when type=pvc) |
cloud |
*CloudBlock |
❌ | Cloud provider configuration including optional credentials secret |
CloudBlock¶
Cloud provider configuration. This type appears both on StorageLocation (for per-storage credentials) and as a top-level spec.cloud field (for workload identity setup).
| Field | Type | Required | Description |
|---|---|---|---|
provider |
string |
❌ | Cloud provider: "aws", "gcp", "azure" |
credentialsSecretRef |
string |
❌ | Name of a Kubernetes Secret containing cloud provider credentials as environment variables. When absent, ambient workload identity (IRSA / GKE WI / Azure WI) is used instead. |
identity |
*CloudIdentity |
❌ | Cloud identity configuration (for workload identity ServiceAccount annotations) |
endpointURL |
string |
❌ | Override the S3 API endpoint. Use for S3-compatible stores such as MinIO, Ceph RGW, or Cloudflare R2 (e.g. "http://minio.minio.svc:9000"). Only applies when provider: aws. |
forcePathStyle |
bool |
❌ | Force S3 path-style addressing (endpoint/bucket/key instead of bucket.endpoint/key). Required for MinIO and most self-hosted S3-compatible stores. Only effective when endpointURL is set. |
Secret key requirements by provider (when credentialsSecretRef is set):
| Provider | Required secret keys | Notes |
|---|---|---|
| AWS | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION |
Standard AWS SDK env vars |
| MinIO / S3-compatible | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION |
Same keys as AWS; set endpointURL and forcePathStyle: true on CloudBlock |
| GCS | GOOGLE_APPLICATION_CREDENTIALS_JSON |
Full service-account key JSON as a string value — not a filename path |
| Azure | AZURE_STORAGE_ACCOUNT, AZURE_STORAGE_KEY |
Storage account credentials |
Example — creating cloud credential secrets:
# AWS
kubectl create secret generic aws-backup-credentials \
--from-literal=AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE \
--from-literal=AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
--from-literal=AWS_REGION=us-east-1
# MinIO (uses the same keys; region value is arbitrary — MinIO ignores it)
kubectl create secret generic minio-backup-credentials \
--from-literal=AWS_ACCESS_KEY_ID=minioadmin \
--from-literal=AWS_SECRET_ACCESS_KEY=minioadmin \
--from-literal=AWS_REGION=us-east-1
# GCS — pass the JSON content directly as a string value
kubectl create secret generic gcs-backup-credentials \
--from-literal=GOOGLE_APPLICATION_CREDENTIALS_JSON="$(cat service-account.json)"
# Azure
kubectl create secret generic azure-backup-credentials \
--from-literal=AZURE_STORAGE_ACCOUNT=myaccount \
--from-literal=AZURE_STORAGE_KEY=base64key==
MinIO / S3-compatible example:
storage:
type: s3
bucket: neo4j-backups
path: cluster/full
cloud:
provider: aws
credentialsSecretRef: minio-backup-credentials
endpointURL: http://minio.minio.svc:9000 # in-cluster MinIO service
forcePathStyle: true # required for MinIO
How it works:
endpointURLis injected asAWS_ENDPOINT_URL_S3(AWS SDK v2 standard).forcePathStyle: trueinjects-Daws.s3.forcePathStyle=trueviaJAVA_TOOL_OPTIONS, which the neo4j-admin JVM process reads at startup.
CloudIdentity¶
Cloud identity configuration for workload identity scenarios (no static credentials).
| Field | Type | Required | Description |
|---|---|---|---|
provider |
string |
✅ | Identity provider: "aws", "gcp", "azure" |
serviceAccount |
string |
❌ | Name of an existing ServiceAccount to use. When absent and autoCreate.enabled=true, the operator creates neo4j-backup-sa. |
autoCreate |
*AutoCreateSpec |
❌ | Auto-create ServiceAccount with workload-identity annotations |
AutoCreateSpec¶
Controls automatic ServiceAccount creation with workload-identity annotations.
| Field | Type | Required | Description |
|---|---|---|---|
enabled |
bool |
❌ | Enable auto-creation of the neo4j-backup-sa ServiceAccount (default: true) |
annotations |
map[string]string |
❌ | Annotations applied to the neo4j-backup-sa ServiceAccount on every reconcile. Use this to attach workload-identity annotations. |
The annotations in autoCreate.annotations are applied to the neo4j-backup-sa ServiceAccount on every reconcile, so they stay in sync with the desired state.
Workload identity annotation examples:
# AWS IRSA
autoCreate:
enabled: true
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/neo4j-backup-role
# GKE Workload Identity
autoCreate:
enabled: true
annotations:
iam.gke.io/gcp-service-account: neo4j-backup@my-project.iam.gserviceaccount.com
# Azure Workload Identity
autoCreate:
enabled: true
annotations:
azure.workload.identity/client-id: 00000000-0000-0000-0000-000000000000
PVCSpec¶
PVC configuration for local storage.
| Field | Type | Required | Description |
|---|---|---|---|
storageClassName |
string |
❌ | Storage class name for dynamic provisioning |
name |
string |
❌ | Name of an existing PVC to use |
size |
string |
❌ | Size for a new PVC (e.g., "100Gi") |
RetentionPolicy¶
Backup retention configuration.
| Field | Type | Required | Description |
|---|---|---|---|
maxAge |
string |
❌ | Maximum age of backups to retain (e.g., "30d", "4w") |
maxCount |
int32 |
❌ | Maximum number of backups to retain |
deletePolicy |
string |
❌ | Action for expired backups: "Delete" (default) or "Archive" |
Cloud storage retention: For cloud storage targets the operator logs a notice to configure bucket lifecycle rules on the cloud provider side. Automated deletion of cloud objects is not performed by the operator.
BackupOptions¶
Fine-grained backup execution options.
| Field | Type | Required | Description |
|---|---|---|---|
compress |
bool |
❌ | Compress the backup (default: true) |
backupType |
string |
❌ | Backup type: "FULL", "DIFF", "AUTO" (default) |
preferDiffAsParent |
bool |
❌ | Use the latest differential backup as the parent when creating a new differential backup (default: false). Maps to --prefer-diff-as-parent. Requires CalVer 2025.04+ — an error is returned at runtime if the target version does not support this flag. |
tempPath |
string |
❌ | Local directory path for temporary files during backup. When tempStorage is configured, this is set automatically. Only set manually if you are mounting your own volume. Maps to --temp-path. |
tempStorage |
*TempStorageSpec |
❌ | Provisions a PVC for temporary staging files during cloud backups. The operator mounts this PVC and passes --temp-path automatically. Recommended for large databases to avoid filling ephemeral disk. |
pageCache |
string |
❌ | Page cache size hint (e.g., "4G"). Must match pattern ^[0-9]+[KMG]?$ |
encryption |
*EncryptionSpec |
❌ | Backup encryption configuration |
verify |
bool |
❌ | Verify backup integrity after creation |
parallelDownload |
bool |
❌ | Enable parallel download for remote backups |
remoteAddressResolution |
bool |
❌ | Resolve remote addresses during backup |
skipRecovery |
bool |
❌ | Skip the recovery step after backup |
includeMetadata |
string |
❌ | Controls which metadata is included in the backup. Values: "all" (default), "none", "users", "roles". Requires Neo4j 5.26+. |
parallelRecovery |
bool |
❌ | Enable multi-threaded transaction application during backup |
keepFailed |
bool |
❌ | Preserve failed backup artifacts for debugging instead of deleting them |
additionalArgs |
[]string |
❌ | Additional arguments passed verbatim to neo4j-admin database backup |
preferDiffAsParentversion requirement: This flag was introduced in Neo4j CalVer 2025.04. Using it against Neo4j 5.26.x or CalVer 2025.01–2025.03 will cause the backup Job to fail with an unsupported argument error. The operator validates this at runtime and returns an error before creating the Job.
TempStorageSpec¶
Provisions temporary staging storage for cloud backup/restore. The operator creates a PVC, mounts it at /tmp/neo4j-staging in the Job pod, and passes --temp-path=/tmp/neo4j-staging to neo4j-admin. The PVC is owned by the Job and garbage-collected when the Job's TTL expires.
| Field | Type | Required | Description |
|---|---|---|---|
size |
string |
✅ | Size of the temporary PVC (e.g., "50Gi"). Should be at least as large as the expected backup artifact. Must match pattern ^[0-9]+(Ki\|Mi\|Gi\|Ti)?$ |
storageClassName |
string |
❌ | StorageClass for the temporary PVC. If empty, uses the cluster default. |
EncryptionSpec¶
Backup encryption configuration.
| Field | Type | Required | Description |
|---|---|---|---|
enabled |
bool |
❌ | Enable backup encryption |
keySecret |
string |
❌ | Name of a Kubernetes Secret containing the encryption key |
keySecretKey |
string |
❌ | Key within the Secret containing the encryption key (default: "key") |
algorithm |
string |
❌ | Encryption algorithm: "AES256" (default) or "ChaCha20Poly1305" |
Status¶
The Neo4jBackupStatus represents the observed state of the backup.
| Field | Type | Description |
|---|---|---|
conditions |
[]metav1.Condition |
Current backup conditions |
phase |
string |
Current backup phase |
message |
string |
Human-readable message about the current state |
lastRunTime |
*metav1.Time |
When the last backup Job started |
lastSuccessTime |
*metav1.Time |
When the last successful backup completed |
nextRunTime |
*metav1.Time |
When the next scheduled backup will run |
stats |
*BackupStats |
Statistics from the most recent backup run |
history |
[]BackupRun |
History of recent backup runs |
BackupStats¶
| Field | Type | Description |
|---|---|---|
size |
string |
Total backup size (e.g., "2.5GB") |
duration |
string |
Backup operation duration (e.g., "5m30s") |
throughput |
string |
Backup throughput rate (e.g., "8.3MB/s") |
fileCount |
int32 |
Number of files in the backup |
BackupRun¶
Represents a single backup Job execution.
| Field | Type | Description |
|---|---|---|
startTime |
metav1.Time |
When the backup run started |
completionTime |
*metav1.Time |
When the run completed (nil if still running) |
status |
string |
Run status: "Running", "Succeeded", "Failed" |
error |
string |
Error message if the backup failed |
stats |
*BackupStats |
Backup statistics for this run |
Examples¶
Scheduled S3 Backup (Cluster) with IRSA¶
Uses AWS IRSA workload identity — no static credentials needed.
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: daily-cluster-backup
namespace: neo4j
spec:
target:
kind: Cluster
name: production-cluster
storage:
type: s3
bucket: neo4j-backups
path: daily/
cloud:
provider: aws
cloud:
provider: aws
identity:
provider: aws
autoCreate:
enabled: true
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/neo4j-backup-role
schedule: "0 2 * * *" # Daily at 2 AM UTC
retention:
maxAge: "30d"
maxCount: 30
options:
compress: true
backupType: FULL
tempStorage:\
size: "50Gi"
encryption:
enabled: true
keySecret: backup-encryption-key
Scheduled S3 Backup with Static Credentials¶
Uses an explicit Kubernetes Secret for AWS credentials instead of IRSA.
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: daily-cluster-backup-static-creds
namespace: neo4j
spec:
target:
kind: Cluster
name: production-cluster
storage:
type: s3
bucket: neo4j-backups
path: daily/
cloud:
provider: aws
credentialsSecretRef: aws-backup-credentials # Secret with AWS_ACCESS_KEY_ID etc.
schedule: "0 2 * * *"
retention:
maxAge: "30d"
maxCount: 30
options:
compress: true
backupType: FULL
tempStorage:\
size: "50Gi"
Single-Database Backup to S3¶
Backs up only one database. Both name (database) and clusterRef (cluster) are required.
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: mydb-daily-backup
namespace: neo4j
spec:
target:
kind: Database
name: mydb # The Neo4j database name
clusterRef: production-cluster # The cluster that hosts the database
namespace: neo4j
storage:
type: s3
bucket: neo4j-backups
path: mydb/daily/
cloud:
provider: aws
credentialsSecretRef: aws-backup-credentials
schedule: "0 3 * * *"
options:
compress: true
backupType: AUTO
tempStorage:\
size: "50Gi"
Differential Backup with preferDiffAsParent (CalVer 2025.04+)¶
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: hourly-diff-backup
namespace: neo4j
spec:
target:
kind: Cluster
name: production-cluster-2025
storage:
type: s3
bucket: neo4j-backups
path: hourly-diff/
cloud:
provider: aws
credentialsSecretRef: aws-backup-credentials
schedule: "0 * * * *" # Every hour
options:
backupType: DIFF
preferDiffAsParent: true # Requires Neo4j CalVer 2025.04+
tempStorage:\
size: "50Gi"
compress: true
On-Demand PVC Backup¶
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: manual-pvc-backup
namespace: neo4j
spec:
target:
kind: Database
name: mydb
clusterRef: staging-cluster
namespace: neo4j
storage:
type: pvc
pvc:
name: backup-storage
path: backups/manual/
options:
compress: true
verify: true
backupType: DIFF
GCS Backup with GKE Workload Identity¶
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: weekly-gcs-backup
namespace: neo4j
spec:
target:
kind: Cluster
name: analytics-cluster
storage:
type: gcs
bucket: neo4j-analytics-backups
path: weekly/
cloud:
provider: gcp
cloud:
provider: gcp
identity:
provider: gcp
autoCreate:
enabled: true
annotations:
iam.gke.io/gcp-service-account: neo4j-backup@my-project.iam.gserviceaccount.com
schedule: "0 3 * * 0" # Weekly on Sunday at 3 AM
retention:
maxCount: 12
deletePolicy: Archive
options:
backupType: AUTO
pageCache: "8G"
tempStorage:\
size: "50Gi"
GCS Backup with Static Service Account Credentials¶
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: weekly-gcs-backup-static
namespace: neo4j
spec:
target:
kind: Cluster
name: analytics-cluster
storage:
type: gcs
bucket: neo4j-analytics-backups
path: weekly/
cloud:
provider: gcp
credentialsSecretRef: gcs-backup-credentials # Secret with GOOGLE_APPLICATION_CREDENTIALS_JSON
schedule: "0 3 * * 0"
retention:
maxCount: 12
options:
backupType: AUTO
pageCache: "8G"
tempStorage:\
size: "50Gi"
Azure Backup with Azure Workload Identity¶
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: daily-azure-backup
namespace: neo4j
spec:
target:
kind: Cluster
name: enterprise-cluster
storage:
type: azure
bucket: neo4j-backups # Azure storage container name
path: daily/
cloud:
provider: azure
cloud:
provider: azure
identity:
provider: azure
autoCreate:
enabled: true
annotations:
azure.workload.identity/client-id: 00000000-0000-0000-0000-000000000000
schedule: "0 1 * * *"
retention:
maxAge: "14d"
options:
compress: true
backupType: FULL
tempStorage:\
size: "50Gi"
Azure Backup with Static Credentials¶
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: daily-azure-backup-static
namespace: neo4j
spec:
target:
kind: Cluster
name: enterprise-cluster
storage:
type: azure
bucket: neo4j-backups
path: daily/
cloud:
provider: azure
credentialsSecretRef: azure-backup-credentials # Secret with AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_KEY
schedule: "0 1 * * *"
retention:
maxAge: "14d"
options:
compress: true
backupType: FULL
tempStorage:\
size: "50Gi"
Monitoring¶
# List all backup resources
kubectl get neo4jbackup -n neo4j
# View backup status and last run time
kubectl get neo4jbackup daily-cluster-backup -o wide
# Describe a backup for detailed status and events
kubectl describe neo4jbackup daily-cluster-backup
# Watch backup status changes
kubectl get neo4jbackup daily-cluster-backup -w
# Check logs from the most recent backup Job
kubectl logs -n neo4j -l neo4j.com/backup=daily-cluster-backup --tail=100
# Check backup phase
kubectl get neo4jbackup daily-cluster-backup -o jsonpath='{.status.phase}'
# Check last success time
kubectl get neo4jbackup daily-cluster-backup -o jsonpath='{.status.lastSuccessTime}'
For more information on backup operations, see the Backup and Restore Guide.