This deck tells one arc. Traditional RAG retrieves similar text and stops
there. GraphRAG preserves entities and relationships, which unlocks three
retrieval patterns. We end on how those retrievers become agent tools.
Background on why context matters, and how embeddings work, lives in the
appendix for anyone who needs it.
Traditional RAG retrieves the chunks most similar to a question and hands
them to the LLM. Embedding and vector-search fundamentals are in the appendix.
Traditional RAG works well for finding relevant passages by topic and
answering questions inside a single document. It is the foundation of modern
AI assistants. But as we will see, it struggles the moment information is
connected across sources.
What traditional RAG sees: a chunk about aircraft AC1001 bearing wear, a chunk
about flight FL00123 delayed at JFK, a chunk about EGT exceeding threshold on
Engine 1. What it misses is everything connecting them. It can find text about
bearing wear and text about delays, but it cannot tell you which flights were
delayed because of a specific maintenance event. Each chunk is embedded and
searched independently. There is no model of how the information connects.
A surprising finding. When RAG retrieves chunks that are similar but not truly
relevant, the context window fills with tangentially related information and
the model gets confused or misled. This became known as Context ROT, the
retrieval of tangents. The retrieved context actively rots the quality of the
answer.
Research from Chroma shows accuracy decreasing as irrelevant context grows.
Adding more retrieved chunks often hurts rather than helps. The takeaway:
quality of context matters more than quantity.
Each of these requires traversing or aggregating over relationships, not
finding similar text. Similarity search cannot express them.
The core insight: documents have structure that traditional RAG ignores by
treating them as a bag of words. GraphRAG extracts that structure into a
knowledge graph that preserves entities, the relationships between them, and
their properties. That shifts the question from "what is similar" to "what is
connected and relevant".
Create the index once, before any vectors are stored. The sample uses the
neo4j_graphrag library with Amazon Bedrock Titan v2 embeddings at 1024
dimensions. It is idempotent, so running it on every load is safe.
Each chunk gets a Titan v2 embedding written to the embedding property on its
Chunk node. The chunkEmbeddings index updates automatically. You can verify
with: MATCH (c:Chunk) RETURN c.text, size(c.embedding) LIMIT 1.
With all five in place, the graph has everything GraphRAG needs: searchable
text, the structure around it, and the provenance behind it.
Vector or fulltext search finds relevant chunks, standard RAG. What GraphRAG
adds is graph traversal from those chunks through the entities and
relationships surrounding them. The agent ends up with far richer context
than text search could provide.
Everything from here is built on the Neo4j Python GraphRAG library. It ships
the three retriever patterns, pluggable embedders, and a pipeline that
combines retrieval with generation in one call.
The GraphRAG class combines retrieval with generation. The retriever's only
job is finding the right context. The LLM's only job is turning that context
into a coherent answer.
This is the single most important slide. Everything that follows is one of
these three retrievers. Vector for content, Vector Cypher for content plus
relationships, Text2Cypher for precise facts. The combination is more powerful
than any one alone.
Embed the query with Amazon Bedrock Titan, pass it into Neo4j as
$queryEmbedding, and the index returns the five most semantically similar
chunks. This is standard RAG retrieval, the entry point into the graph.
This is what traditional RAG cannot do. After vector search finds the starting
chunks, we traverse the graph from them, here to the source document. The same
pattern extends to components, events, and sensors.
The decision framework in three questions. Content or facts: content goes to
Vector or Vector Cypher, facts go to Text2Cypher. Do you need related
entities: no means Vector, yes means Vector Cypher. Is it about relationships:
traversals go to Vector Cypher or Text2Cypher, pure semantics to Vector.
Mechanics only here. Embedding and similarity fundamentals are in the
appendix. The key behavior: semantic match, not keyword match. "Engine
problems" surfaces chunks about bearing wear and vibration exceedance.
Driver is the Neo4j connection, index_name is where embeddings live, embedder
is the Amazon Bedrock Titan model that vectorizes the query. search returns
the top_k most similar chunks, each with its text and a similarity score.
A practical scale for reading scores. Below 0.80 the match is usually too weak
to trust as context.
Vector retriever returns text and nothing else. It cannot scope to a specific
entity or aggregate. When you need related entities, move to Vector Cypher.
Two steps. Step one is the same vector search as before. Step two traverses
the graph from each matched chunk to gather connected entities and
relationships. You get semantic relevance and graph structure together.
You supply a retrieval_query that runs after vector search. The embedder is
Amazon Bedrock Titan, consistent with the rest of the deck.
Your query receives node (the matched chunk) and score (similarity). From
there you traverse to components and their events and return enriched results.
A plain MATCH silently drops components with no events. OPTIONAL MATCH keeps
them with an empty events list, which is almost always what you want.
This is the key limitation. Traversal starts from the chunks vector search
returns. If those chunks are generic, the traversal never reaches the specific
entity. Entity-scoped questions belong to Text2Cypher.
Vector Cypher shines when the answer is content plus the structured entities
connected to it.
No embeddings involved. The LLM, given the schema, translates the question
directly into Cypher, runs it, and returns exact structured results.
get_schema introspects the graph. Passing it in is what keeps generated Cypher
valid.
The schema is the contract. With it the LLM generates valid Cypher. Without
it, it hallucinates properties and relationship types that do not exist.
Text2Cypher answers exact questions about what is in the graph. It cannot
predict, and it cannot answer questions whose answer lives in unstructured
chunk text. Match the retriever to the question.
Generated queries are still queries. Read-only credentials and query
validation are the minimum safeguards before exposing this anywhere.
The canonical comparison. Read the question, pick the retriever.
In the fleet-agent sample, each retriever is one single-responsibility tool.
The docstring is the routing logic the model reads. One tool per retriever,
routing driven by the model.
The whole arc: traditional RAG's limits motivate GraphRAG; GraphRAG enables
three retrieval patterns; each becomes a tool an agent selects. Vector finds
the entry point, the graph adds the structure around it.
LLMs excel at pattern recognition and language fluency. These capabilities
emerge from training on huge text corpora.
The model produces the most likely continuation, not a verified fact, and it
does so confidently, complete with invented citations.
The model has no knowledge of your internal data or anything after its cutoff,
yet it will still answer confidently.
These questions require connecting entities across documents and traversing
chains of relationships, which sequential text processing cannot do.
Each limitation has a concrete failure mode. Building real systems means
designing around all three.
Give the model relevant information in the prompt and all three problems
shrink. RAG automates supplying that context instead of doing it by hand. This
is the foundation of Retrieval-Augmented Generation.
The four-step pattern. The rest of the appendix unpacks embeddings and vector
search, the machinery behind step three.
Embeddings are like a librarian who has read every book and organizes by
meaning rather than by title or subject keywords.
A vector is just a list of numbers locating a point in space. In machine
learning those numbers can encode the meaning of text.
Embeddings turn text into vectors where closeness equals similarity in
meaning. That property is what makes semantic search possible.
Keyword search needs the exact term. Vector search matches meaning, so
"engine problems" surfaces "bearing wear" and "overheat".
Conceptual scale. The main deck has a finer-grained practical band table for
reading retriever scores in production.