Architecture Overview¶

This guide provides a comprehensive overview of the Neo4j Enterprise Operator's architecture, design principles, and current implementation status as of August 2025.

Core Design Principles¶

The Neo4j Enterprise Operator follows cloud-native best practices with a focus on:

Production Stability: Optimized reconciliation frequency and efficient resource management
Performance: Intelligent rate limiting and status update optimization
Server-Based Architecture: Unified server deployments with self-organizing roles
Resource Efficiency: Centralized backup system (70% resource reduction)
Observability: Comprehensive monitoring and operational insights
Validation: Proactive resource validation and recommendations

Current Architecture (August 2025)¶

Server-Based Architecture¶

The operator has evolved to use a unified server-based architecture where Neo4j servers self-organize into primary/secondary roles:

Key Changes from Legacy Architecture¶

Before: Separate primary/secondary StatefulSets with complex orchestration
After: Single {cluster-name}-server StatefulSet with self-organizing servers
Benefit: Simplified resource management, improved scaling, reduced complexity

Current Implementation¶

# Neo4jEnterpriseCluster topology
topology:
  servers: 3  # Creates: my-cluster-server StatefulSet (replicas: 3)
  # Pods: my-cluster-server-0, my-cluster-server-1, my-cluster-server-2

# Neo4jEnterpriseStandalone deployment
# Creates: my-standalone StatefulSet (replicas: 1)
# Pod: my-standalone-0

Centralized Backup System¶

Major Efficiency Improvement: Replaced expensive per-pod backup sidecars with centralized backup architecture:

Resource Efficiency: 100m CPU/256Mi memory per cluster vs N×200m CPU/512Mi per sidecar
Resource Savings: ~70% reduction in backup-related resource usage
Architecture: Single {cluster-name}-backup-0 StatefulSet per cluster
Connectivity: Connects to cluster via client service using Bolt protocol
Neo4j 5.26+ Support: Modern backup syntax with automated path creation

Custom Resource Definitions (CRDs)¶

The operator defines six core CRDs located in api/v1beta1/:

Core Deployment CRDs¶

Neo4jEnterpriseCluster (`neo4jenterprisecluster_types.go`)¶

Purpose: High-availability clustered Neo4j Enterprise deployments
Architecture: Server-based with {cluster-name}-server StatefulSet
Minimum Topology: 2+ servers (enforced by validation)
Server Organization: Servers self-organize into primary/secondary roles for databases
Scaling: Horizontal scaling supported with topology validation
Discovery: LIST resolver with static pod FQDNs; V2_ONLY explicitly set for 5.26.x, implicit for 2025.x+
Resource Pattern: Single StatefulSet replaces complex multi-StatefulSet architecture

Key Fields:

type Neo4jEnterpriseClusterSpec struct {
    Image    ImageSpec              `json:"image"`
    Topology TopologyConfiguration  `json:"topology"`  // servers: N
    Storage  StorageSpec           `json:"storage"`
    // Outbound TLS trust + escape hatches (see "Security Architecture
    // > 6. Outbound trust" further down for the full lifecycle):
    TrustedCASecrets  []TrustedCASecret    `json:"trustedCASecrets,omitempty"`
    ExtraVolumes      []corev1.Volume      `json:"extraVolumes,omitempty"`
    ExtraVolumeMounts []corev1.VolumeMount `json:"extraVolumeMounts,omitempty"`
    // ... additional fields
}

Neo4jEnterpriseStandalone (`neo4jenterprisestandalone_types.go`)¶

Purpose: Single-node Neo4j Enterprise deployments
Architecture: Uses clustering infrastructure but fixed at 1 replica
Use Cases: Development, testing, simple production workloads
StatefulSet: {standalone-name} (no "-server" suffix)
Configuration: Modern clustering approach with single member (Neo4j 5.26+)
Restrictions: Cannot scale beyond 1 replica
Shared API surface: Mirrors the cluster spec for TrustedCASecrets / ExtraVolumes / ExtraVolumeMounts. The wire-up differs slightly: cluster pods receive -Djavax.net.ssl.trustStore=… via the NEO4J_server_jvm_additional env var, while standalone pods receive the same flags as server.jvm.additional=… lines emitted into the ConfigMap-backed neo4j.conf.

Database Management CRDs¶

Neo4jDatabase (`neo4jdatabase_types.go`)¶

Purpose: Manages database lifecycle within clusters and standalone deployments
Dual Support: Works with both Neo4jEnterpriseCluster and Neo4jEnterpriseStandalone
Enhanced Validation: DatabaseValidator supports automatic deployment type detection
Neo4j 5.26+ Syntax: Uses modern TOPOLOGY clause for database creation
Standalone Fix: Added NEO4J_AUTH environment variable for automatic authentication

Key Features:

type Neo4jDatabaseSpec struct {
    ClusterRef string           `json:"clusterRef"`     // References cluster OR standalone
    Name       string           `json:"name"`           // Database name
    Topology   DatabaseTopology `json:"topology"`       // Primary/secondary counts
    IfNotExists bool            `json:"ifNotExists"`    // CREATE IF NOT EXISTS
}

Neo4jPlugin (`neo4jplugin_types.go`)¶

Purpose: Manages Neo4j plugin installation and configuration
Dual Architecture Support: Enhanced for server-based cluster + standalone compatibility
Deployment Detection: Automatic cluster vs standalone recognition
Resource Naming: Handles {cluster-name}-server vs {standalone-name} patterns
Plugin Sources: Official, community, custom registry, direct URL support

Backup & Restore CRDs¶

Neo4jBackup (`neo4jbackup_types.go`)¶

Purpose: Manages backup operations for both clusters and standalone deployments
Centralized Architecture: Uses single backup pod per cluster (not sidecars)
Target Support: Can backup both cluster and standalone deployments
Neo4j 5.26+ Support: Modern backup syntax with --to-path parameter

Neo4jRestore (`neo4jrestore_types.go`)¶

Purpose: Manages database restoration from backups
Point-in-Time Recovery: Supports --restore-until for precise recovery
Cross-Deployment Support: Can restore to different deployment types

Controllers Architecture¶

Core Controllers (`internal/controller/`)¶

Neo4jEnterpriseCluster Controller (`neo4jenterprisecluster_controller.go`)¶

Primary cluster management controller with server-based architecture:

Performance Optimizations: - Efficient Reconciliation: Reduced from ~18,000 to ~34 reconciliations per minute - Smart Status Updates: Only updates when cluster state changes - ConfigMap Debouncing: 2-minute debounce prevents restart loops - Resource Version Conflict Handling: Retry logic for concurrent updates

Server-Based Implementation: - Single StatefulSet: Creates {cluster-name}-server instead of separate primary/secondary - Self-Organizing Servers: Neo4j servers automatically assign database hosting roles - Simplified Resource Management: Unified pod templates and configuration - Certificate DNS: Includes all server pod names in TLS certificates

Split-Brain Detection: - Location: internal/controller/splitbrain_detector.go - Multi-Pod Analysis: Connects to each server to compare cluster views - Automatic Repair: Restarts orphaned pods to rejoin main cluster - Production Ready: Comprehensive logging and fallback mechanisms

Neo4jEnterpriseStandalone Controller (`neo4jenterprisestandalone_controller.go`)¶

Single-node deployment controller:

Key Features: - Clustering Infrastructure: Uses same infrastructure as clusters (Neo4j 5.26+ approach) - Single Member Configuration: Sets up clustering with single server - Resource Management: Handles ConfigMap, Service, and StatefulSet - Status Tracking: Comprehensive status updates for standalone instances

Database Controller (`neo4jdatabase_controller.go`)¶

Enhanced for dual deployment support: - Automatic Detection: Tries cluster lookup first, then standalone fallback - Neo4j Client Creation: NewClientForEnterprise() vs NewClientForEnterpriseStandalone() - Authentication Handling: Manages NEO4J_AUTH for standalone deployments - Syntax Support: Neo4j 5.26+ and 2025.x database creation syntax

Plugin Controller (`plugin_controller.go`)¶

Manages plugin lifecycle with architecture compatibility: - DeploymentInfo Abstraction: Unified handling of cluster/standalone types - Resource Naming: Correct StatefulSet names ({cluster-name}-server vs {standalone-name}) - Pod Labels: Applies appropriate labels for each deployment type - Plugin Sources: Official, community, custom registries, direct URLs

Backup Controller (`neo4jbackup_controller.go`)¶

Centralized backup management: - Architecture: Single backup StatefulSet per cluster - Resource Efficiency: 70% reduction in backup resource usage - Cross-Deployment Support: Backs up both clusters and standalone deployments - Modern Syntax: Neo4j 5.26+ compatible backup commands

Restore Controller (`neo4jrestore_controller.go`)¶

Database restoration management: - Point-in-Time Recovery: Supports precise timestamp restoration - Flexible Targets: Can restore to different deployment types - Validation: Ensures target deployment compatibility

Validation Framework (`internal/validation/`)¶

Comprehensive Validation Architecture¶

Core Validators:¶

TopologyValidator (topology_validator.go): Cluster topology and server count validation
ClusterValidator (cluster_validator.go): Cluster-specific configuration validation
MemoryValidator (memory_validator.go): Neo4j memory settings vs container limits
ResourceValidator (resource_validator.go): CPU, memory, and storage validation
TLSValidator (tls_validator.go): TLS/SSL configuration validation
TruststoreValidator (truststore_validator.go): unique Secret names in spec.trustedCASecrets (the name doubles as the keytool alias) plus reserved-mount-path collision check for spec.extraVolumeMounts (/data, /logs, /conf, /ssl, /plugins, /truststore, /truststore-ca, /var/lib/neo4j/...)
DatabaseValidator (database_validator.go): Database creation and topology validation
AuthRuleValidator (authrule_validator.go): Neo4jAuthRule name pattern + DDL-keyword guard on the condition expression (rejects CREATE / DROP / ALTER / GRANT / DENY / REVOKE / SHOW / RENAME and ; injection)
RoleValidator / UserValidator / RoleBindingValidator (role_validator.go, user_validator.go, rolebinding_validator.go): privilege list, identifier rules, cross-CR overlap with Neo4jUser

Enhanced Validation Features:¶

Dual CRD Validation: Separate validation rules for cluster vs standalone
Server-Based Topology: Validates server counts instead of primary/secondary counts
Resource Recommendations: Suggests optimal resource allocation
Configuration Restrictions: Prevents clustering settings in standalone deployments
Neo4j Version Compatibility: Validates settings against Neo4j 5.26+ and 2025.x

Database Validator Enhancements¶

Automatic Deployment Detection: Tries cluster first, then standalone
Appropriate Client Creation: Uses correct client type for deployment
Clear Error Messages: Distinguishes between cluster and standalone validation failures

Neo4j Version Compatibility¶

Supported Versions¶

Neo4j 5.26.x: Last semver LTS release (5.26.0, 5.26.1, etc.) — no 5.27+ semver versions exist
Neo4j 2025.x+: Calver format (2025.01.0, 2025.02.0, etc.)

Version-Specific Configuration¶

Discovery Configuration (LIST resolver, injected by startup script):¶

Setting	5.26.x (SemVer)	2025.x+ / 2026.x+ (CalVer)
`dbms.cluster.discovery.resolver_type`	`LIST`	`LIST`
`dbms.cluster.discovery.version`	`V2_ONLY` (explicit)	(omitted — V2 is only protocol)
Endpoints key	`dbms.cluster.discovery.v2.endpoints`	`dbms.cluster.endpoints`
Endpoint port	6000 (tcp-tx)	6000 (tcp-tx)
Bootstrap hint	`internal.dbms.cluster.discovery.system_bootstrapping_strategy=me/other`	(not used)

Port 5000 (tcp-discovery) is the deprecated V1 discovery port — never used by this operator. CalVer detection: ParseVersion() → IsCalver (major >= 2025) covers 2026.x+ automatically.

Modern Configuration Standards:¶

Memory: server.memory.* (not deprecated dbms.memory.*)
TLS/SSL: server.https.* and server.bolt.* (not dbms.connector.*)
Database Format: db.format: "block" (not deprecated formats)
Discovery: managed entirely by operator startup script — do not set in spec.config

Database Creation Syntax¶

Neo4j 5.26+ (Cypher 5):¶

CREATE DATABASE name [IF NOT EXISTS]
[TOPOLOGY n PRIMAR{Y|IES} [m SECONDAR{Y|IES}]]
[OPTIONS "{" option: value[, ...] "}"]
[WAIT [n [SEC[OND[S]]]]|NOWAIT]

Neo4j 2025.x (Cypher 25):¶

CREATE DATABASE name [IF NOT EXISTS]
[[SET] DEFAULT LANGUAGE CYPHER {5|25}]
[[SET] TOPOLOGY n PRIMARIES [m SECONDARIES]]
[OPTIONS "{" option: value[, ...] "}"]
[WAIT [n [SEC[OND[S]]]]|NOWAIT]

Resource Management Architecture¶

Intelligent Resource Handling¶

Resource Builders (`internal/resources/`):¶

ClusterBuilder (cluster.go): Server-based StatefulSet creation
StandaloneBuilder (standalone.go): Single-node deployment resources
ConfigMapBuilder: Unified configuration for both deployment types
ServiceBuilder: Client and discovery services
BackupBuilder: Centralized backup StatefulSet
TruststoreBuilder (cluster.go:BuildTrustStoreInitContainer / BuildTrustStoreVolumes / CollectTrustedCASecrets): emits the per-Secret volume mounts, the writable /truststore EmptyDir, and the truststore-init init container (seeds /truststore/truststore.jks from $JAVA_HOME/lib/security/cacerts, then keytool -import for each spec.trustedCASecrets entry using the Secret name as alias). Reused by both cluster (env var) and standalone (ConfigMap) wire-up paths via CollectTrustedCASecrets, which folds the legacy singular spec.auth.trustStore into the new plural list.

Server-Based Resource Patterns:¶

StatefulSet Naming: {cluster-name}-server for clusters, {standalone-name} for standalone
Pod Naming: {cluster-name}-server-0, {cluster-name}-server-1, etc.
Service Names: {cluster-name}-client, {cluster-name}-discovery
Backup Resources: {cluster-name}-backup-0 (centralized)
Truststore mount: /truststore/truststore.jks (read-only, populated by the truststore-init init container; password is the JVM default changeit)
User-supplied volumes: spec.extraVolumes are appended to the pod spec verbatim; spec.extraVolumeMounts are appended to the Neo4j container's mounts after operator-managed mounts but before any backup-sidecar mounts. Operator-managed paths (/data, /logs, /conf, /ssl, /plugins, /truststore, /truststore-ca, /var/lib/neo4j/...) are off-limits and rejected by the validator.

Performance Optimizations¶

Reconciliation Efficiency:¶

Rate Limiting: Intelligent rate limiting prevents API server overload
Status Update Efficiency: Only updates when state actually changes
Event Filtering: Reduces unnecessary reconciliation triggers
ConfigMap Hashing: Hash-based change detection prevents unnecessary updates

Startup Optimization:¶

Parallel Pod Management: All server pods start simultaneously
Minimum Primaries = 1: First pod forms cluster immediately
PublishNotReadyAddresses: Discovery includes pending pods
Resource Version Conflict Retry: Handles concurrent updates gracefully

Security Architecture¶

RBAC Configuration (`config/rbac/`)¶

Core RBAC Resources:¶

Principle of Least Privilege: Minimal required permissions
ClusterRole Design: Cross-namespace operations support
Service Account Security: Dedicated accounts with specific roles

Discovery RBAC (Critical):¶

Each cluster gets automatic RBAC creation: - ServiceAccount: {cluster-name}-discovery - Role: Services and endpoints permissions - RoleBinding: Links account to role - Endpoints Permission: CRITICAL for cluster formation

TLS/SSL Support¶

The operator integrates with cert-manager for the full certificate lifecycle. The flow is the same for clusters and standalones, with one structural difference noted at the end.

1. Activation¶

TLS is opt-in via spec.tls:

spec:
  tls:
    mode: cert-manager
    issuerRef:
      name: ca-cluster-issuer
      kind: ClusterIssuer

Validation (internal/validation/tls_validator.go) runs inline during reconciliation — no admission webhooks are used (CLAUDE.md rule #26). When mode: disabled, the entire TLS path is skipped and the deployment uses plain bolt:// and http://.

2. Certificate creation¶

BuildCertificateForEnterprise (internal/resources/cluster.go) emits a cert-manager.io/v1 Certificate whose dnsNames cover every endpoint clients or peers may connect to:

The headless discovery service ({cluster}-discovery)
The client service ({cluster}-client)
Each individual server pod FQDN ({cluster}-server-0.{cluster}-discovery.{ns}.svc.cluster.local, …)
LoadBalancer hostnames where applicable

The Certificate references the user-supplied issuerRef and writes its material into a Secret named {resource-name}-tls-secret (tls.crt, tls.key, ca.crt). cert-manager owns rotation; the operator never touches expiry.

3. Mounting into Neo4j pods¶

The StatefulSet builder mounts the Secret read-only at /ssl/ (internal/resources/cluster.go:~1349). Neo4j is then pointed at this directory via server.directories.certificates=/ssl along with three SSL policies that share the same key/cert/CA bundle:

dbms.ssl.policy.bolt.* — client traffic (bolt+s://)
dbms.ssl.policy.https.* — Browser/HTTP API
dbms.ssl.policy.cluster.* — RAFT and discovery between server pods

dbms.ssl.policy.cluster.trust_all=true is set to allow servers to trust each other during cluster formation without per-pod CA pinning.

When TLS is enabled, server.bolt.tls_level=REQUIRED is also set — plain bolt:// connections are rejected (regression checklist item #16).

4. Operator-side Bolt connection (outgoing)¶

The operator reconciles the cluster by issuing Cypher commands over Bolt. Two separate concerns layer here: which scheme the URI uses (routing vs direct) and which CA the client trusts.

URI scheme — routing vs direct. buildConnectionURIForEnterprise (internal/neo4j/client.go) uses the routing scheme:

Spec	Scheme
`spec.tls.mode: disabled` (or unset)	`neo4j://`
`spec.tls.mode: cert-manager`	`neo4j+s://`

The routing scheme is mandatory. Cluster admin commands (CREATE/DROP USER, GRANT/REVOKE, CREATE/ALTER/DROP DATABASE, AUTH RULE management, etc.) must execute on the cluster leader; the Go driver routes write transactions to the leader only under neo4j://. Under the direct bolt:// scheme, AccessMode: AccessModeWrite is silently ignored and connections land wherever K8s steered them via the {cluster}-client ClusterIP. The operator's Bolt clients used to use bolt://, which produced Neo.ClientError.Cluster.NotALeader on N-1 of every N reconciles and visible Ready ↔ Failed status flicker on the role/user/auth-rule controllers. See checklist item #62.

The single legitimate bolt:// consumer is internal/controller/splitbrain_detector.go:createPodSpecificNeo4jClient, which intentionally bypasses the routing layer to query each pod's RAFT view individually — the whole point of split-brain detection is to compare per-pod state, not to talk to the leader. Standalone deployments use the routing scheme too, for symmetry; on a single-member topology getRoutingTable reports the lone member as both reader and writer, so behavior is equivalent to direct connection.

Driver timeouts. NewClientForEnterprise / NewClientForEnterpriseStandalone configure: - ConnectionAcquisitionTimeout = 10s — full budget for getting a connection (includes routing-table fetch retries under neo4j://) - SocketConnectTimeout = 5s — TCP connect to a router member - MaxTransactionRetryTime = 15s — retry budget for transient errors

These values are deliberately tight: an unreachable cluster fails fast instead of stalling the controller's reconcile queue behind hung Bolt calls. Healthy clusters complete the routing handshake in well under one second. See checklist item #63.

TLS. buildTLSConfig (internal/neo4j/client.go) governs which CA the client trusts:

Auto-discovery: load ca.crt from the {resource-name}-tls-secret Secret and pin it as the trusted CA for outgoing connections. This is the default path — no user configuration required.
Override: spec.tls.trustedCASecret lets users point at a different Secret (e.g. when bringing their own CA outside cert-manager).
Fallback: InsecureSkipVerify is used only during the brief window before the Secret has been populated by cert-manager (regression checklist item #27).

All three Bolt entry points — NewClientForEnterprise, NewClientForEnterpriseStandalone, and NewClientForPod (split-brain detector) — go through buildTLSConfig, so the scheme switches dynamically between TLS-enabled and plain variants based on spec.tls.mode (checklist item #28).

5. Standalone differences¶

Neo4jEnterpriseStandalone follows the same flow with two structural differences:

A single pod, so dnsNames is shorter (one server FQDN + the client service).
Neo4j configuration is delivered via a ConfigMap rather than StatefulSet env vars; the health.sh probe (mounted alongside neo4j.conf with mode 0755) also lives in this ConfigMap (checklist item #34).

6. Outbound trust — `spec.trustedCASecrets` & `spec.extraVolumes`¶

The cert-manager flow above governs Neo4j-the-server's inbound TLS (Bolt, HTTPS, intra-cluster RAFT). Neo4j also makes outbound TLS calls — to OIDC providers, LDAPS servers, Aura Fleet Management, plugin download mirrors, and in some cluster topologies to peer clusters for replication. When those endpoints use a CA the JDK doesn't trust by default, the operator wires a custom JVM truststore.

Sources of truth in code:

Concern	Source
API types	`api/v1beta1/neo4jenterprisecluster_types.go:TrustedCASecret`, plus `Neo4jEnterpriseClusterSpec.TrustedCASecrets / ExtraVolumes / ExtraVolumeMounts` (mirrored on standalone)
Validation	`internal/validation/truststore_validator.go` (unique Secret names, reserved-mount-path collision check)
Init container + volumes	`internal/resources/cluster.go:BuildTrustStoreInitContainer / BuildTrustStoreVolumes / CollectTrustedCASecrets`
JVM-additional wire-up (cluster)	`internal/resources/cluster.go` — env var `NEO4J_server_jvm_additional`
JVM-additional wire-up (standalone)	`internal/controller/neo4jenterprisestandalone_controller.go:createConfigMap` — written as `server.jvm.additional=...` lines in the ConfigMap-backed neo4j.conf

Init container flow (one container, runs before Neo4j, image is the same Neo4j image so keytool is guaranteed to be present):

cp $JAVA_HOME/lib/security/cacerts /truststore/truststore.jks — seeds the writable JKS with the JDK's default trust roots so public CAs (Let's Encrypt, DigiCert, etc.) keep working. Without this seed step Neo4j would lose trust in public infrastructure when any custom CA was added.
For each TrustedCASecret: keytool -import -trustcacerts -alias <secret-name> -file /trusted-ca/<secret-name>/<key> (default key ca.crt matches the layout of cert-manager-issued Secrets).
The resulting JKS is mounted read-only at /truststore/truststore.jks into the main Neo4j container.

JVM args: -Djavax.net.ssl.trustStore=/truststore/truststore.jks -Djavax.net.ssl.trustStorePassword=changeit — appended to whatever the user supplied via spec.config["server.jvm.additional"]. Cluster pods receive these via the NEO4J_server_jvm_additional env var; standalone pods receive them as server.jvm.additional=... lines written into the ConfigMap-backed neo4j.conf.

Backward compatibility: the older singular spec.auth.trustStore (*SecretKeyRef) is folded into the new list at reconcile time via CollectTrustedCASecrets. Both paths produce the same volumes, init container, and JVM flags. Names from the explicit trustedCASecrets list win on duplication.

Reserved paths for ExtraVolumeMounts: /data, /logs, /conf, /ssl, /plugins, /truststore, /truststore-ca, /var/lib/neo4j (and its data/, logs/, conf/, plugins/, certificates/ subdirectories) are all rejected by the validator — silently overlaying them would either destroy operator-managed content or fight the truststore-init flow.

Why this is needed for ABAC: Neo4j 2026.04 hard-requires https:// for every dbms.security.oidc.<name>.* URI and rejects http:// at config-parse time, before boot. Test environments and self-hosted OIDC providers therefore need a TLS-fronted stub plus a trusted CA — trustedCASecrets is the ergonomic path; extraVolumes is the escape hatch when a Neo4j SSL policy references a per-policy truststore_path.

7. Adjacent integrations¶

ExternalSecrets: when spec.auth.adminSecret references a Secret managed by ExternalSecrets, the operator resolves it identically — TLS material remains under cert-manager's control regardless.
Neo4jAuthRule (ABAC) and OIDC trust: the auth-rule reconciler talks to Neo4j via Bolt and does not directly interact with the JVM truststore. However, the cluster itself needs trust to fetch the OIDC well-known document — that's what trustedCASecrets configures.

TLS/SSL quick reference¶

Concern	Source
Validation	`internal/validation/tls_validator.go`, `internal/validation/truststore_validator.go`
Certificate CR shape	`internal/resources/cluster.go:BuildCertificateForEnterprise`
`/ssl/` mount + SSL policies	`internal/resources/cluster.go` (~line 1349, ~line 1672)
Operator outgoing Bolt TLS	`internal/neo4j/client.go:buildTLSConfig`
Neo4j-server outgoing TLS truststore	`internal/resources/cluster.go:BuildTrustStoreInitContainer` (init container) + `NEO4J_server_jvm_additional` env var
`spec.trustedCASecrets` API	`api/v1beta1/neo4jenterprisecluster_types.go:TrustedCASecret`
`spec.extraVolumes` / `spec.extraVolumeMounts` API	same file, on the cluster + standalone specs
Regression invariants	CLAUDE.md checklist items #16, #27, #28, #34

Monitoring & Observability¶

Resource Monitoring (`internal/monitoring/`):¶

ResourceMonitor (resource_monitor.go): Real-time utilization tracking
Performance Metrics: Controller performance and reconciliation efficiency
Operational Insights: ConfigMap update patterns and debounce effectiveness

Status Management:¶

Enhanced Status Updates: Detailed cluster state tracking
Condition Management: Comprehensive status conditions with proper transitions
Event Recording: Structured events for debugging and monitoring
Connection Examples: Automatic generation of connection strings

Monitoring and Live Diagnostics¶

The MonitoringSpec field (spec.monitoring) drives two distinct responsibilities inside the cluster controller:

1. Infrastructure setup (ReconcileMonitoring): Creates Kubernetes resources for metrics collection: - {cluster-name}-metrics Service — exposes port 2004 for Prometheus scraping - {cluster-name}-monitoring ServiceMonitor — tells the Prometheus Operator to scrape the metrics service - Neo4j config flags (server.metrics.prometheus.enabled=true, prometheus.io/* annotations)

Runs on every reconcile regardless of cluster phase.

2. Live diagnostics (CollectDiagnostics): Runs SHOW SERVERS and SHOW DATABASES via the Bolt client when the cluster is Ready: - Writes results to status.diagnostics (ClusterDiagnosticsStatus) - Sets ServersHealthy condition (True when all servers are state=Enabled and health=Available) - Sets DatabasesHealthy condition (True when all user databases have status=online; the system database is excluded) - Updates neo4j_operator_server_health Prometheus gauge per server (labels: cluster_name, namespace, server_name, server_address) - Non-fatal: collection errors are surfaced in status.diagnostics.collectionError and the conditions are set to Unknown with reason DiagnosticsUnavailable

The diagnostics Bolt client is created fresh per-reconcile and closed with defer. It never shares state with the cluster formation or upgrade clients.

Architecture invariant: All status writes in CollectDiagnostics use retry.RetryOnConflict to handle concurrent updates without panicking.

Condition constants (defined in internal/controller/conditions.go): - ConditionTypeServersHealthy = "ServersHealthy" - ConditionTypeDatabasesHealthy = "DatabasesHealthy" - Reason values: AllServersHealthy, ServerDegraded, AllDatabasesOnline, DatabaseOffline, DiagnosticsUnavailable

Integration Architecture¶

External System Integration:¶

Cert-Manager: TLS certificate lifecycle management
Prometheus: Metrics collection and alerting
External Secrets: Secret management integration
Storage Classes: Persistent volume provisioning
Cloud Providers: AWS, GCP, Azure LoadBalancer optimizations

Kubernetes Integration:¶

Network Policies: Pod-to-pod communication security
Service Mesh: Istio/Linkerd compatibility
Ingress Controllers: External traffic routing with connection examples
Node Affinity: Topology spread and anti-affinity rules

Testing Architecture¶

Test Strategy:¶

Unit Tests: Controller logic and helper functions
Integration Tests: Full workflow testing with envtest
End-to-End Tests: Real cluster testing with Kind
Performance Tests: Reconciliation efficiency validation

Test Infrastructure:¶

Ginkgo/Gomega: BDD-style testing framework
Envtest: Kubernetes API server for integration testing
Kind Clusters: Development and test cluster automation
Test Cleanup: Automatic finalizer removal and namespace cleanup

Migration & Compatibility¶

Legacy Architecture Migration:¶

Backward Compatibility: Existing clusters continue to work
Gradual Migration: No breaking changes for existing deployments
Resource Name Updates: New deployments use server-based naming
Configuration Migration: Automatic handling of deprecated settings

Future Extensibility:¶

Plugin System: Neo4j plugin management framework
Custom Metrics: Extensible monitoring capabilities
Event Handling: Pluggable event system for custom integrations
Multi-Architecture: Support for different deployment patterns

Development Best Practices¶

Code Organization:¶

Controller Pattern: Standard Kubernetes controller pattern
Builder Pattern: Resource builders for clean separation
Validation Framework: Centralized validation with clear error messages
Testing Strategy: Comprehensive test coverage with multiple levels

Performance Considerations:¶

Memory Usage: Optimized for large-scale deployments
API Efficiency: Minimal API calls with intelligent caching
Resource Creation: Parallel resource creation where possible
Error Handling: Graceful error handling with proper recovery

This architecture provides a solid foundation for managing Neo4j Enterprise deployments in Kubernetes with high performance, reliability, and operational simplicity.