Backup Disk Space Management Guide¶

This guide covers disk space management for Neo4j backups in Kubernetes environments.

Overview¶

Neo4j backups can consume significant disk space, especially in production environments with: - Large databases - Frequent backup schedules - Multiple backup types (FULL, DIFF, AUTO) - Long retention policies

Note: Commands referencing backup-sidecar apply to standalone deployments. For clusters, use the centralized {cluster}-backup-0 pod (container backup) and the /backups mount.

Automatic Cleanup¶

Backup Sidecar Retention¶

Standalone deployments use a backup sidecar that manages disk space with configurable retention policies. Cluster backups run in a centralized {cluster}-backup pod and keep the most recent backups by default.

apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jEnterpriseStandalone
metadata:
  name: production-standalone
spec:
  # ... other configuration ...
  env:
    - name: BACKUP_RETENTION_DAYS
      value: "14"  # Keep backups for 14 days
    - name: BACKUP_RETENTION_COUNT
      value: "20"  # Keep maximum 20 backups

Default retention settings: - BACKUP_RETENTION_DAYS: 7 days - BACKUP_RETENTION_COUNT: 10 backups

The sidecar automatically: 1. Removes backups older than retention days 2. Keeps only the most recent N backups 3. Runs cleanup before and after each backup

Manual Cleanup¶

Using the Cleanup Script¶

For test environments or emergency cleanup:

# Run the cleanup script
./hack/cleanup-test-resources.sh

# What it does:
# - Removes completed jobs older than 1 hour
# - Deletes failed and evicted pods
# - Identifies orphaned PVCs
# - Shows disk usage by namespace
# - Cleans Docker system (for Kind clusters)

Manual Commands¶

Check disk usage:

# Check PV usage
kubectl get pv -o custom-columns=NAME:.metadata.name,CAPACITY:.spec.capacity.storage,CLAIM:.spec.claimRef.name

# Check node disk usage
kubectl describe nodes | grep -A5 "Allocated resources:"

# Check specific PVC usage
kubectl exec <neo4j-pod> -- df -h /data

Clean up old backups manually:

# Delete backups older than 7 days
kubectl exec <neo4j-pod> -c backup-sidecar -- \
  find /data/backups -maxdepth 1 -type d -mtime +7 -exec rm -rf {} \;

# Keep only 5 most recent backups
kubectl exec <neo4j-pod> -c backup-sidecar -- bash -c \
  'cd /data/backups && ls -t | tail -n +6 | xargs -r rm -rf'

Best Practices¶

1. Storage Sizing¶

Calculate required storage:

Required Storage = Database Size × Backup Compression Ratio × Number of Retained Backups × Safety Factor

Example:
- Database Size: 100GB
- Compression Ratio: 0.3 (70% compression)
- Retained Backups: 10
- Safety Factor: 1.5
- Required: 100GB × 0.3 × 10 × 1.5 = 450GB

2. Backup Strategy¶

Optimize backup types:

# Daily full backups with short retention
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: daily-full
spec:
  schedule: "0 2 * * *"  # 2 AM daily
  options:
    backupType: FULL
    compress: true
  retention:
    maxAge: "3d"
    maxCount: 3

# Hourly differential backups
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: hourly-diff
spec:
  schedule: "0 * * * *"  # Every hour
  options:
    backupType: DIFF
    compress: true
  retention:
    maxAge: "1d"
    maxCount: 24

3. Monitoring¶

Set up alerts for disk usage:

# Prometheus alert example
groups:
- name: neo4j-backups
  rules:
  - alert: BackupDiskSpaceHigh
    expr: |
      (1 - (node_filesystem_avail_bytes{mountpoint="/data"} /
      node_filesystem_size_bytes{mountpoint="/data"})) > 0.8
    for: 10m
    annotations:
      summary: "Backup disk usage above 80%"

4. External Storage¶

For production, consider external storage:

S3 Storage¶

apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
  name: s3-backup
spec:
  storage:
    type: s3
    bucket: my-neo4j-backups
    path: production/cluster-1
  retention:
    maxAge: "30d"  # S3 lifecycle policies handle cleanup

PVC with StorageClass¶

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: backup-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd  # Use appropriate storage class
  resources:
    requests:
      storage: 500Gi

Troubleshooting¶

Disk Full Errors¶

Symptoms:

java.io.IOException: No space left on device

Quick fixes: 1. Run cleanup script: ./hack/cleanup-test-resources.sh 2. Delete old backups (cluster): kubectl exec <cluster>-backup-0 -c backup -- rm -rf /backups/old-* 3. Delete old backups (standalone): kubectl exec <standalone-pod> -c backup-sidecar -- rm -rf /data/backups/old-* 4. Increase PVC size (if storage class supports expansion)

Prevention¶

Set appropriate retention policies

env:
- name: BACKUP_RETENTION_DAYS
  value: "3"  # Shorter for test environments
- name: BACKUP_RETENTION_COUNT
  value: "5"  # Fewer backups for test

Use compressed backups

options:
  compress: true  # Reduces backup size by 60-80%

Monitor disk usage proactively

# Add to monitoring scripts
kubectl exec <pod> -- df -h /data | awk '$5+0 > 80 {print "WARNING: " $0}'

Summary¶

Effective disk space management requires: - Automatic cleanup via sidecar retention policies - Regular monitoring of disk usage - Appropriate backup strategies (FULL vs DIFF) - External storage for production environments - Proactive cleanup in test environments

The backup sidecar's built-in cleanup functionality handles most scenarios automatically, but manual intervention may be needed for test environments or exceptional situations.