Example Use Cases
Federated queries through Unity Catalog let Databricks users query Neo4j graph data alongside lakehouse tables without moving data between systems. The Neo4j JDBC driver translates SQL to Cypher automatically, so analysts work in familiar SQL while the driver handles graph traversals behind the scenes. Results return as standard Spark DataFrames that can be joined with Delta tables, Iceberg tables, lakehouse relational data, or any other UC data source.
This page describes common use cases where this integration delivers value.
Types of Federated Queries
The Neo4j UC JDBC integration supports several categories of federated queries:
-
Aggregate queries —
COUNT,SUM,AVG,MIN,MAX, andCOUNT DISTINCTare pushed down to Neo4j viaremote_query(), so only summarized results travel over the network. Useful for fleet-wide metrics, relationship counts, and graph statistics. -
Filtered queries —
WHEREclauses with equals,IN,AND, andIS NOT NULLare pushed down, letting you narrow results at the source before joining with lakehouse data. -
Graph traversal queries — SQL JOINs across Neo4j node labels (e.g.,
Flight NATURAL JOIN DEPARTS_FROM NATURAL JOIN Airport) are translated to Cypher pattern matches like(f:Flight)-[:DEPARTS_FROM]->(a:Airport), enabling multi-hop relationship traversals expressed as standard SQL. -
Hybrid queries — Combine
remote_query()for aggregate graph metrics with the Neo4j Spark Connector for row-level data, then join both with Delta tables in a single notebook. This is the most powerful pattern, using each method for what it does best. -
Cross-source joins — Results from Neo4j federated queries can be joined with any UC-governed data source including Delta tables, Iceberg tables, lakehouse relational databases, and other JDBC-connected systems, all in a single SQL statement.
Fraud Detection
A fraud detection team has transaction data in Delta tables and entity relationships (accounts, devices, shared identifiers) in Neo4j. Today, combining those datasets requires custom ETL or duplicate copies. With this integration, a single Databricks notebook joins Neo4j graph traversal results with Delta table aggregations in one query. No data movement, no pipeline to maintain.
Example federated queries:
-
Aggregate query counting shared-device connections across accounts via
remote_query(), joined with transaction volume from Delta tables to flag high-risk clusters -
Graph traversal query following multi-hop paths (Account → Device → Account) translated from SQL JOINs to Cypher pattern matches, correlated with transaction anomaly scores from the lakehouse
-
Hybrid query combining a
remote_query()aggregate for network-wide fraud indicators with Spark Connector row-level data for per-account risk scoring
Knowledge Graph-Enriched ML Pipelines
Data science teams training models on lakehouse data can pull graph-derived features directly from Neo4j without building separate ETL to extract and materialize those features into Delta tables. Features like node centrality, community detection scores, relationship counts, and shortest path distances enrich ML models with structural context that tabular data alone cannot capture.
Example federated queries:
-
Aggregate query counting each entity’s relationship degree via
remote_query(), joined with feature tables in Delta for model training -
Filtered query retrieving graph features for a specific subset of entities (e.g.,
WHERE category IN ('high-value', 'at-risk')), merged with behavioral data from the lakehouse -
Cross-source join combining Neo4j community detection clusters with customer segments and purchase history from Delta tables to build recommendation models
Supply Chain Visibility
Organizations modeling supplier networks and dependencies in Neo4j can join that graph data with inventory and logistics data in Delta tables for real-time risk assessment and disruption analysis, all from a single Databricks notebook. Graph traversals reveal hidden dependencies (a tier-3 supplier that feeds into multiple critical components), while the lakehouse provides inventory levels, lead times, and demand forecasts.
Example federated queries:
-
Graph traversal query following supplier dependency chains (Supplier → Component → Assembly → Product) to identify single points of failure, joined with inventory levels from Delta tables
-
Aggregate query counting the depth and breadth of each supplier’s network via
remote_query(), cross-joined with logistics performance metrics from the lakehouse -
Hybrid query combining
remote_query()aggregates for network-wide supplier concentration risk with Spark Connector row-level data for per-component alternative sourcing analysis
Customer 360 and Identity Resolution
Teams with customer identity graphs in Neo4j (linking accounts, devices, emails, phone numbers, and other identifiers) can join that graph data with behavioral and transactional data in the lakehouse for a unified customer view. Neo4j excels at resolving which identifiers belong to the same person across systems, while the lakehouse holds the volumetric event data (purchases, page views, support tickets) that drives analytics.
Example federated queries:
-
Graph traversal query resolving identity clusters (Person → HAS_EMAIL → Email, Person → HAS_DEVICE → Device) via SQL JOINs translated to Cypher, joined with purchase history and engagement scores from Delta tables
-
Aggregate query counting resolved identities and merge candidates via
remote_query(), combined with customer lifetime value calculations from the lakehouse -
Cross-source join combining Neo4j identity resolution results with behavioral data from Delta tables, campaign response data from a marketing database, and support ticket history, all governed by Unity Catalog access controls
Ad Hoc Exploration Before ETL Investment
Analysts can prototype cross-system queries joining Neo4j graph data with lakehouse tables before committing to building a full ingestion pipeline. This is exactly the proof-of-concept scenario that Databricks recommends federation for. Instead of spending weeks building and maintaining ETL, analysts validate whether the combined data answers their questions first, then invest in pipelines only where the value is proven.
Example federated queries:
-
Aggregate query exploring graph metrics via
remote_query()to assess whether relationship data from Neo4j adds predictive value to an existing lakehouse model -
Filtered query pulling a subset of graph data for a specific business unit or time range, joined ad hoc with Delta tables to prototype a new dashboard
-
Hybrid query combining multiple federation methods in a single notebook to rapidly evaluate which Neo4j data is worth materializing into the lakehouse long-term
Learn More
-
Federated Query Patterns — Live query examples combining Neo4j and Delta lakehouse data
-
UC Integration Setup — How to configure the Neo4j JDBC connection to Unity Catalog
-
Databricks Lakehouse Federation — Official Databricks documentation on federated queries
-
Neo4j JDBC SQL2Cypher — How SQL is translated to Cypher automatically