Backup Disk Space Management Guide¶
This guide covers disk space management for Neo4j backups in Kubernetes environments.
Overview¶
Neo4j backups can consume significant disk space, especially in production environments with: - Large databases - Frequent backup schedules - Multiple backup types (FULL, DIFF, AUTO) - Long retention policies
Note: Commands referencing backup-sidecar apply to standalone deployments. For clusters, use the centralized {cluster}-backup-0 pod (container backup) and the /backups mount.
Automatic Cleanup¶
Backup Sidecar Retention¶
Standalone deployments use a backup sidecar that manages disk space with configurable retention policies. Cluster backups run in a centralized {cluster}-backup pod and keep the most recent backups by default.
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jEnterpriseStandalone
metadata:
name: production-standalone
spec:
# ... other configuration ...
env:
- name: BACKUP_RETENTION_DAYS
value: "14" # Keep backups for 14 days
- name: BACKUP_RETENTION_COUNT
value: "20" # Keep maximum 20 backups
Default retention settings:
- BACKUP_RETENTION_DAYS: 7 days
- BACKUP_RETENTION_COUNT: 10 backups
The sidecar automatically: 1. Removes backups older than retention days 2. Keeps only the most recent N backups 3. Runs cleanup before and after each backup
Manual Cleanup¶
Using the Cleanup Script¶
For test environments or emergency cleanup:
# Run the cleanup script
./hack/cleanup-test-resources.sh
# What it does:
# - Removes completed jobs older than 1 hour
# - Deletes failed and evicted pods
# - Identifies orphaned PVCs
# - Shows disk usage by namespace
# - Cleans Docker system (for Kind clusters)
Manual Commands¶
Check disk usage:
# Check PV usage
kubectl get pv -o custom-columns=NAME:.metadata.name,CAPACITY:.spec.capacity.storage,CLAIM:.spec.claimRef.name
# Check node disk usage
kubectl describe nodes | grep -A5 "Allocated resources:"
# Check specific PVC usage
kubectl exec <neo4j-pod> -- df -h /data
Clean up old backups manually:
# Delete backups older than 7 days
kubectl exec <neo4j-pod> -c backup-sidecar -- \
find /data/backups -maxdepth 1 -type d -mtime +7 -exec rm -rf {} \;
# Keep only 5 most recent backups
kubectl exec <neo4j-pod> -c backup-sidecar -- bash -c \
'cd /data/backups && ls -t | tail -n +6 | xargs -r rm -rf'
Best Practices¶
1. Storage Sizing¶
Calculate required storage:
Required Storage = Database Size × Backup Compression Ratio × Number of Retained Backups × Safety Factor
Example:
- Database Size: 100GB
- Compression Ratio: 0.3 (70% compression)
- Retained Backups: 10
- Safety Factor: 1.5
- Required: 100GB × 0.3 × 10 × 1.5 = 450GB
2. Backup Strategy¶
Optimize backup types:
# Daily full backups with short retention
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: daily-full
spec:
schedule: "0 2 * * *" # 2 AM daily
options:
backupType: FULL
compress: true
retention:
maxAge: "3d"
maxCount: 3
# Hourly differential backups
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: hourly-diff
spec:
schedule: "0 * * * *" # Every hour
options:
backupType: DIFF
compress: true
retention:
maxAge: "1d"
maxCount: 24
3. Monitoring¶
Set up alerts for disk usage:
# Prometheus alert example
groups:
- name: neo4j-backups
rules:
- alert: BackupDiskSpaceHigh
expr: |
(1 - (node_filesystem_avail_bytes{mountpoint="/data"} /
node_filesystem_size_bytes{mountpoint="/data"})) > 0.8
for: 10m
annotations:
summary: "Backup disk usage above 80%"
4. External Storage¶
For production, consider external storage:
S3 Storage¶
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: s3-backup
spec:
storage:
type: s3
bucket: my-neo4j-backups
path: production/cluster-1
retention:
maxAge: "30d" # S3 lifecycle policies handle cleanup
PVC with StorageClass¶
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: backup-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd # Use appropriate storage class
resources:
requests:
storage: 500Gi
Troubleshooting¶
Disk Full Errors¶
Symptoms:
Quick fixes:
1. Run cleanup script: ./hack/cleanup-test-resources.sh
2. Delete old backups (cluster): kubectl exec <cluster>-backup-0 -c backup -- rm -rf /backups/old-*
3. Delete old backups (standalone): kubectl exec <standalone-pod> -c backup-sidecar -- rm -rf /data/backups/old-*
4. Increase PVC size (if storage class supports expansion)
Prevention¶
-
Set appropriate retention policies
-
Use compressed backups
-
Monitor disk usage proactively
Summary¶
Effective disk space management requires: - Automatic cleanup via sidecar retention policies - Regular monitoring of disk usage - Appropriate backup strategies (FULL vs DIFF) - External storage for production environments - Proactive cleanup in test environments
The backup sidecar's built-in cleanup functionality handles most scenarios automatically, but manual intervention may be needed for test environments or exceptional situations.