Query 4: Flight Operations + Engine Performance

Aircraft utilization (flight frequency, route coverage) correlated with engine health metrics (EGT, fuel flow, N1 speed). Shows how heavily-used aircraft perform from an engine perspective.

Federation Method: Neo4j Spark Connector → temp view → JOIN with Delta tables

Data Flow

Flight Operations + Engine Performance Flow

Load Neo4j Data

Load flight nodes from Neo4j into a Spark temp view:

neo4j_flights = spark.read.format("org.neo4j.spark.DataSource") \
    .option("url", NEO4J_BOLT_URI) \
    .option("authentication.type", "basic") \
    .option("authentication.basic.username", NEO4J_USER) \
    .option("authentication.basic.password", NEO4J_PASSWORD) \
    .option("labels", "Flight") \
    .load()

neo4j_flights.createOrReplaceTempView("neo4j_flights")

Federated SQL Query

WITH aircraft_ref AS (
    SELECT `:ID(Aircraft)` AS aircraft_id, tail_number, model, operator
    FROM aircraft
),
flight_activity AS (
    SELECT
        aircraft_id,
        COUNT(*) AS total_flights,
        COUNT(DISTINCT origin) AS unique_origins,
        COUNT(DISTINCT destination) AS unique_destinations
    FROM neo4j_flights
    GROUP BY aircraft_id
),
engine_health AS (
    SELECT
        sys.aircraft_id,
        ROUND(AVG(CASE WHEN sen.type = 'EGT' THEN r.value END), 1) AS avg_egt,
        ROUND(AVG(CASE WHEN sen.type = 'FuelFlow' THEN r.value END), 2) AS avg_fuel_flow,
        ROUND(AVG(CASE WHEN sen.type = 'N1Speed' THEN r.value END), 0) AS avg_n1_speed
    FROM sensor_readings r
    JOIN sensors sen ON r.sensor_id = sen.`:ID(Sensor)`
    JOIN systems sys ON sen.system_id = sys.`:ID(System)`
    WHERE sys.type = 'Engine'
    GROUP BY sys.aircraft_id
)
SELECT
    a.tail_number,
    a.model,
    a.operator,
    f.total_flights,
    f.unique_origins AS origins,
    f.unique_destinations AS destinations,
    e.avg_egt AS avg_egt_c,
    e.avg_fuel_flow AS fuel_kgs,
    e.avg_n1_speed AS n1_rpm
FROM aircraft_ref a
JOIN flight_activity f ON a.aircraft_id = f.aircraft_id
JOIN engine_health e ON a.aircraft_id = e.aircraft_id
ORDER BY f.total_flights DESC

What This Query Does

  • Neo4j (Spark Connector): Loads Flight nodes, groups by aircraft to compute flight counts and unique origin/destination airports

  • Delta (3-table join): Aggregates engine-specific sensor readings (filtered to sys.type = 'Engine') per aircraft

  • Federation: INNER JOINs both on aircraft_id so only aircraft with both flights and engine data appear

Actual Output: Load Flights (Databricks Runtime 17.3 LTS)

The following output was captured from a live Databricks cluster using the Neo4j Spark Connector against Neo4j Aura:

Loaded 800 flights from Neo4j

Sample flight data:
+-----------+-------------+-----------+------+-----------+
|aircraft_id|flight_number|operator   |origin|destination|
+-----------+-------------+-----------+------+-----------+
|AC1013     |EX370        |ExampleAir |PHX   |SEA        |
|AC1014     |SK739        |SkyWays    |SEA   |SFO        |
|AC1012     |NO833        |NorthernJet|ATL   |LAX        |
|AC1008     |NO234        |NorthernJet|DFW   |LAX        |
|AC1001     |EX199        |ExampleAir |MIA   |SFO        |
+-----------+-------------+-----------+------+-----------+
only showing top 5 rows

Actual Output: Federated Query (Databricks Runtime 17.3 LTS)

This federated result was produced live on Databricks, JOINing Neo4j flight data with Delta engine sensor readings:

Flight Operations + Engine Performance
Delta: sensor_readings (Engine sensors), sensors, systems, aircraft
Neo4j: Flight nodes (Spark Connector)
==========================================================================================
+-----------+--------+-----------+-------------+-------+------------+---------+--------+------+
|tail_number|model   |operator   |total_flights|origins|destinations|avg_egt_c|fuel_kgs|n1_rpm|
+-----------+--------+-----------+-------------+-------+------------+---------+--------+------+
|N86057G    |A321neo |RegionalCo |51           |12     |12          |671.6    |1.39    |4768.0|
|N63098R    |A320-200|SkyWays    |48           |11     |12          |671.6    |1.39    |4768.0|
|N89365K    |A321neo |RegionalCo |48           |12     |12          |671.6    |1.39    |4768.0|
|N95040A    |B737-800|ExampleAir |47           |12     |12          |671.6    |1.39    |4768.0|
|N76758N    |A320-200|SkyWays    |46           |12     |12          |671.6    |1.39    |4768.0|
|N15332P    |E190    |NorthernJet|46           |10     |11          |671.6    |1.39    |4768.0|
|N81338F    |A320-200|SkyWays    |45           |12     |12          |671.6    |1.39    |4768.0|
|N84110I    |B737-800|ExampleAir |43           |12     |12          |671.6    |1.39    |4768.0|
|N30268B    |A320-200|SkyWays    |41           |12     |12          |671.6    |1.39    |4768.0|
|N96107S    |A321neo |RegionalCo |41           |12     |12          |671.6    |1.39    |4768.0|
|N44342Q    |B737-800|ExampleAir |41           |12     |12          |671.6    |1.39    |4768.0|
|N46224T    |E190    |NorthernJet|40           |12     |10          |671.6    |1.39    |4768.0|
|N84703L    |E190    |NorthernJet|40           |12     |12          |671.6    |1.39    |4768.0|
|N13211H    |E190    |NorthernJet|39           |9      |12          |671.6    |1.39    |4768.0|
|N54980C    |A321neo |RegionalCo |34           |12     |11          |671.6    |1.39    |4768.0|
|N32276J    |A320-200|SkyWays    |33           |12     |12          |671.6    |1.39    |4768.0|
|N37272D    |E190    |NorthernJet|30           |11     |10          |671.6    |1.39    |4768.0|
|N65164O    |A321neo |RegionalCo |30           |12     |12          |671.7    |1.39    |4768.0|
|N26760M    |B737-800|ExampleAir |30           |11     |12          |671.6    |1.39    |4769.0|
|N53032E    |B737-800|ExampleAir |27           |10     |12          |671.6    |1.39    |4768.0|
+-----------+--------+-----------+-------------+-------+------------+---------+--------+------+

Key Difference from Query 3

This query filters Delta sensor data to engine systems only (WHERE sys.type = 'Engine'), while Query 3 includes all sensor types. It also uses INNER JOINs instead of LEFT JOINs since we only want aircraft with both flight and engine data.