Property Sharding with Neo4j Kubernetes Operator¶
Property Sharding is an advanced Neo4j feature that separates graph structure (nodes and relationships) from properties, enabling better scalability for property-heavy workloads by distributing properties across multiple databases.
Overview¶
Property Sharding decouples data into: - Graph Shard: Single database containing nodes and relationships WITHOUT properties - Property Shards: Multiple databases containing properties distributed via hash function - Virtual Database: Logical database presenting unified view of graph + property shards
Prerequisites¶
System Requirements¶
Neo4j Version Requirements: - Minimum: Neo4j 2025.12-enterprise (introduced in 2025.12) - Note: Property sharding is an enterprise-only feature and requires valid licensing - Not available on Aura
Cluster Infrastructure Requirements: - Minimum Servers: 2 servers minimum (3+ recommended for HA graph shard primaries) - Recommended Servers: 3-7 servers (odd numbers provide better consensus characteristics) - Maximum Recommended: 20+ servers for very large deployments
Resource Requirements per Server:
| Component | Minimum (Basic) | Recommended (Production) | Notes |
|---|---|---|---|
| Memory | 4GB total | 8GB+ total | 4GB minimum, 8GB+ recommended |
| CPU | 1 core | 2+ cores | Cross-shard queries are CPU intensive |
| Storage | 10GB | 100GB+ | Depends on data volume and shard count |
| Network | 1Gbps | 10Gbps+ | Low latency critical for transaction log sync |
Additional Requirements:
- Authentication: Admin secret required (property sharding requires authenticated cluster access)
- Storage Class: Persistent storage class must be specified (e.g., standard, fast-ssd)
- Kubernetes Version: 1.24+ for full operator compatibility
- Network Policy: Allow inter-pod communication on discovery and bolt ports
- Cypher Version: Must use Cypher 25 for sharded database operations
Performance Considerations: - Memory Overhead: 20-30% additional memory required for shard coordination - CPU Overhead: 20-30% additional CPU required for cross-shard operations - Storage Growth: Linear growth with number of property shards - Network Traffic: 2-3x increase in inter-server communication
Capacity Planning Guidelines:
Development/Testing:
topology:
servers: 3
resources:
requests:
memory: 4Gi # Absolute minimum for dev/test
cpu: 2000m # 2 cores for cross-shard queries
limits:
memory: 8Gi # Recommended for production
cpu: 2000m
Production:
topology:
servers: 3 # or 5+ for larger datasets
resources:
requests:
memory: 4Gi # Minimum for dev/test
cpu: 2000m # Cross-shard performance
limits:
memory: 8Gi # Recommended for production
cpu: 4000m # Handle peak loads
High-Performance Production:
topology:
servers: 7 # Better distribution
resources:
requests:
memory: 8Gi # Production recommendation
cpu: 4000m # Maximum throughput
limits:
memory: 20Gi # Peak load handling
cpu: 6000m # Burst capability
Quick Start¶
✅ Tested Configuration: The examples below have been tested with Neo4j 2025.12-enterprise on Kubernetes clusters with the specified resource requirements. Property sharding cluster creation typically completes in 2-3 minutes with proper resources.
0. Prerequisites¶
Before creating a property sharding cluster, create the required admin secret:
kubectl create secret generic neo4j-admin-secret \
--from-literal=username=neo4j \
--from-literal=password=your-secure-password
1. Create Property Sharding Enabled Cluster¶
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jEnterpriseCluster
metadata:
name: property-sharding-cluster
namespace: default
spec:
image:
repo: neo4j
tag: 2025.12-enterprise # Property sharding requires 2025.12+
# Authentication required for property sharding
auth:
adminSecret: neo4j-admin-secret
topology:
servers: 3 # 3+ recommended for HA graph shard primaries
storage:
size: 10Gi
className: standard # Storage class must be specified
resources:
requests:
memory: 4Gi # Minimum 4GB for dev/test environments
cpu: 2000m # 2+ cores required for cross-shard queries
limits:
memory: 8Gi # Recommended 8GB for production
cpu: 4000m # Higher CPU for shard coordination overhead (20-30% increase)
# Enable property sharding
propertySharding:
enabled: true
config:
# Required configuration (applied automatically if not specified)
internal.dbms.sharded_property_database.enabled: "true"
db.query.default_language: "CYPHER_25"
internal.dbms.sharded_property_database.allow_external_shard_access: "false"
# Performance tuning (optional)
db.tx_log.rotation.retention_policy: "7 days"
internal.dbms.sharded_property_database.property_pull_interval: "10ms"
2. Create Sharded Database¶
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jShardedDatabase
metadata:
name: products-sharded-db
namespace: default
spec:
# Reference to property sharding enabled cluster
clusterRef: property-sharding-cluster
# Virtual database name (what users connect to)
name: products
# Cypher language version: "5" or "25". Property sharding requires Cypher 25,
# which is only available on Neo4j 2025.x or later.
defaultCypherLanguage: "25"
# Property sharding configuration
propertySharding:
propertyShards: 4 # Number of property shards
# Graph shard topology (stores nodes/relationships)
graphShard:
primaries: 3
secondaries: 2
# Property shard topology (stores properties)
propertyShardTopology:
replicas: 2
# Database creation options
wait: true # Wait for creation to complete
ifNotExists: true # Don't fail if database exists
Seeding from Backups (Optional)¶
Use seedURI for a single backup location that contains shard-suffixed artifacts
(for example, products-g000 and products-p000). Use seedURIs for per-shard
URIs when seeding from dumps or multiple locations.
spec:
seedURI: "s3://backups/products/"
seedConfig:
restoreUntil: "2025-06-01T10:30:00Z"
seedCredentials:
secretRef: "seed-credentials"
Configuration Reference¶
PropertyShardingSpec (Neo4jEnterpriseCluster)¶
| Field | Type | Required | Description |
|---|---|---|---|
enabled |
boolean | Yes | Enable property sharding support on cluster |
config |
map[string]string | No | Advanced property sharding configuration |
Required Configuration Settings¶
When propertySharding.enabled is true, these settings are automatically applied:
config:
internal.dbms.sharded_property_database.enabled: "true"
db.query.default_language: "CYPHER_25"
internal.dbms.sharded_property_database.allow_external_shard_access: "false"
Optional Performance Tuning¶
config:
# Transaction log retention (critical for shard sync)
db.tx_log.rotation.retention_policy: "7 days"
# Property pull interval
internal.dbms.sharded_property_database.property_pull_interval: "10ms"
# Memory configuration
server.memory.heap.max_size: "8G"
server.memory.pagecache.size: "4G"
PropertyShardingConfiguration (Neo4jShardedDatabase)¶
| Field | Type | Required | Description |
|---|---|---|---|
propertyShards |
int32 | Yes | Number of property shards (1-1000) |
graphShard |
DatabaseTopology | Yes | Topology for graph shard |
propertyShardTopology |
PropertyShardTopology | Yes | Replica topology for property shards |
DatabaseTopology¶
| Field | Type | Description |
|---|---|---|
primaries |
int32 | Number of primary replicas |
secondaries |
int32 | Number of secondary replicas |
PropertyShardTopology¶
| Field | Type | Description |
|---|---|---|
replicas |
int32 | Number of replicas per property shard |
Best Practices¶
Cluster Sizing¶
Development/Testing: - 2 servers minimum (3+ recommended for HA graph shard primaries) - 4GB heap per server (absolute minimum for property sharding) - 8GB+ total RAM per server (recommended for stable production operation) - Property shards: 2-4
Production: - 3+ servers recommended (HA graph shard primaries) - 4-8GB+ heap per server (4GB minimum, 8GB for production) - 20GB+ total RAM per server (accounting for system overhead) - 2+ CPU cores per server (cross-shard query requirements) - Property shards: 4-16 (start conservatively)
Property Distribution Strategy¶
Property distribution across shards is automatic; the operator does not provide per-property controls.
Shard Count Recommendations¶
| Dataset Size | Property Shards | Reasoning |
|---|---|---|
| < 1M nodes | 2-4 | Minimal overhead |
| 1M-10M nodes | 4-8 | Balanced distribution |
| 10M-100M nodes | 8-16 | Better parallelization |
| 100M+ nodes | 16-32 | Maximum distribution |
Note: Resharding requires an offline neo4j-admin database copy and recreating the database; the operator does not automate resharding.
Monitoring and Observability¶
Cluster Status¶
Check if property sharding is ready:
Look for:
Sharded Database Status¶
Check sharded database health:
Key status fields:
status:
phase: Ready
shardingReady: true
graphShard:
name: products-g000
ready: true
state: online
propertyShards:
- name: products-p000
ready: true
state: online
virtualDatabase:
name: products
ready: true
Database Operations¶
Connect to the virtual database:
Query the virtual database (automatically routes to appropriate shards):
// Connect to virtual database
:use products
// Standard queries work transparently
MATCH (n:Product) RETURN n.name, n.description LIMIT 10;
// Create data (properties distributed automatically)
CREATE (p:Product {
id: "12345", // Stays in graph shard
name: "Widget", // Stays in graph shard
description: "...", // Goes to property shard
metadata: "{...}" // Goes to property shard
});
Check individual shards:
// Graph shard (nodes/relationships only)
:use system
SHOW DATABASES
WHERE name STARTS WITH "products-g"
// Property shards (properties only)
SHOW DATABASES
WHERE name STARTS WITH "products-p"
Troubleshooting¶
Common Issues¶
1. Version Mismatch
Solution: Upgrade to Neo4j 2025.12 or later.2. Insufficient Memory
Error: property sharding requires minimum 4GB memory for basic operation, got XXXMB (recommended: 8GB+)
resources:
requests:
memory: 4Gi # Minimum requirement
limits:
memory: 8Gi # Recommended for production
3. Insufficient CPU Resources
Solution: Increase CPU allocation for cross-shard query performance:resources:
requests:
cpu: 2000m # Recommended minimum for cross-shard queries
limits:
cpu: 4000m # Allow burst capacity
4. Invalid Server Count
Solution: Set at least 2 servers, and consider 3+ for HA graph shard primaries: topology:
servers: 3 # 3+ recommended for HA graph shard primaries
# Consider 7 servers for larger datasets
5. Cluster Not Configured for Property Sharding
Solution: EnsurepropertySharding.enabled: true on the referenced cluster and that it is Ready.
Debugging Commands¶
# Check cluster configuration
kubectl describe neo4jenterprisecluster property-sharding-cluster
# Check sharded database events
kubectl describe neo4jshardeddatabase products-sharded-db
# View operator logs
kubectl logs -n neo4j-operator deployment/neo4j-operator-controller-manager
# Check individual database status
kubectl exec property-sharding-cluster-server-0 -- \
cypher-shell -u neo4j -p password "SHOW DATABASES"
# Check virtual database
kubectl exec property-sharding-cluster-server-0 -- \
cypher-shell -u neo4j -p password "SHOW DATABASES"
Backup and Recovery¶
Note: The operator does not orchestrate backupConfig for Neo4jShardedDatabase. Use explicit Neo4jBackup resources to back up each shard.
Property sharding backup coordinates across all shards:
apiVersion: neo4j.neo4j.com/v1beta1
kind: Neo4jBackup
metadata:
name: sharded-backup
spec:
clusterRef: property-sharding-cluster
# Multi-database backup
databases:
- name: "products-g000" # Graph shard
type: "full+differential"
- name: "products-p000" # Property shards
type: "full"
- name: "products-p001"
type: "full"
- name: "products-p002"
type: "full"
- name: "products-p003"
type: "full"
schedule: "0 2 * * *"
consistency: "cross-database" # Ensure consistent backup point
Performance Considerations and Optimization¶
Query Performance Patterns¶
Fast Queries (Graph Shard Only):
// Structure queries - very fast (single shard)
MATCH (n)-[r]->(m) RETURN count(r)
// Index-based lookups on graph properties - fast
MATCH (p:Product) WHERE p.id = "12345" RETURN p.name, p.category
Moderate Queries (Mixed Access):
// Graph structure + property shard access - moderate speed
MATCH (p:Product)
WHERE p.category = "electronics" // Graph shard
RETURN p.name, p.description // Mixed: graph + property shard
Slower Queries (Heavy Property Access):
// Full property shard scans - slower
MATCH (p:Product)
WHERE p.description CONTAINS "high-performance" // Property shard scan
RETURN p.name, p.specifications // Property shard data
Performance Optimization Strategies¶
1. Automatic Property Distribution:
Property distribution is automatic for sharded databases. Focus on shard count, replica topology, and resource sizing.
2. Resource Scaling Recommendations:
Development (Basic Testing): - Memory: 6-8GB per server (basic functionality) - CPU: 1-2 cores per server (acceptable for development) - Network: Standard Kubernetes networking (1Gbps) - Servers: 2 minimum (3+ recommended for HA)
Production (Recommended): - Memory: 4-8GB per server (4GB minimum, 8GB recommended) - CPU: 2-4 cores per server (cross-shard query performance) - Network: High-speed networking (10Gbps+, low latency) - Servers: 3-7 servers (better shard distribution)
High-Performance (Enterprise): - Memory: 16-20GB+ per server (maximum throughput) - CPU: 4-6+ cores per server (concurrent cross-shard queries) - Network: Ultra-low latency networking (<1ms) - Servers: 7+ servers (optimal shard placement)
3. Monitoring Key Metrics:
Resource Utilization:
# Monitor memory usage per server
kubectl top pods -l app.kubernetes.io/name=your-cluster --containers
# Check CPU utilization during peak queries
kubectl top pods -l app.kubernetes.io/name=your-cluster --containers
Query Performance:
// Monitor cross-shard query latencies
CALL dbms.queryJmx("org.neo4j:instance=kernel#0,name=Transactions")
YIELD attributes
RETURN attributes.NumberOfOpenTransactions;
// Check transaction log positions
SHOW DATABASES
WHERE name STARTS WITH "your-db-"
RETURN name, currentStatus, requestedStatus;
Network and I/O:
# Monitor network traffic between pods
kubectl exec your-cluster-server-0 -- ss -tuln
# Check storage I/O patterns
kubectl exec your-cluster-server-0 -- iostat -x 1
Storage Scaling Considerations¶
Graph Shard Growth: - Grows with number of nodes and relationships - Index storage for graph properties - Transaction logs for consensus
Property Shard Growth: - Grows with property volume per shard - Distributed based on hash function - Each shard has independent transaction logs
Total Storage Planning:
Total Storage = Graph_Shard + (Property_Shards × Avg_Property_Size) + Overhead
Example:
- Graph: 10GB (structure + graph properties)
- Property Shards: 4 × 25GB = 100GB (distributed properties)
- Overhead: 20GB (transaction logs, indexes, system)
- Total: ~130GB per server (with replication)
Network Performance Requirements:
Transaction Log Synchronization: - Latency: <10ms between servers (critical) - Bandwidth: 100MB/s+ sustained (busy clusters) - Consistency: All shards must stay within transaction log window
Cross-Shard Query Traffic: - Latency: <5ms for responsive queries - Bandwidth: Proportional to result set size - Concurrent Queries: Multiple cross-shard queries increase traffic
Performance Troubleshooting¶
Slow Query Performance: 1. Check property distribution strategy 2. Monitor cross-shard query patterns 3. Verify adequate CPU allocation 4. Check network latency between servers
High Memory Usage: 1. Monitor heap usage during peak loads 2. Check transaction log retention settings 3. Verify property shard distribution balance 4. Consider increasing memory limits
Network Saturation: 1. Monitor inter-pod network traffic 2. Check for transaction log lag 3. Verify network bandwidth capacity 4. Consider network policy optimizations
Migration from Standard Databases¶
Property sharding cannot be enabled on existing databases. Migration approaches:
- Create new sharded database and migrate data
- Export/import with property distribution
- Application-level migration with dual-write
See migration guide for detailed procedures.
Limitations¶
- No in-operator resharding: Resharding requires offline
neo4j-admin database copyand recreation - Neo4j version: Requires 2025.12+ enterprise
- Cypher version: Must use Cypher 25
- No online resharding: Plan shard count carefully
- Increased complexity: More monitoring and operational overhead