GraphRAG: From Retrieval Limits to Graph-Enriched Search

Question	Why It Struggles
Aircraft with engines that have critical events	Traverse Aircraft to System to Component to Event
Components sharing fault types across the fleet	Find shared patterns across aircraft
How many flights delayed due to maintenance	Aggregation, not similarity
Sensors on the same system as a failed part	Traverse entity relationships

Component	Purpose
Documents	Source provenance
Chunks	Searchable text units
Embeddings	Semantic search
Entities	Structured domain knowledge
Relationships	Connections between entities

Pattern	What It Does
Vector	Semantically similar content (standard RAG)
Vector Cypher	Similar content, then traverse to entities
Text2Cypher	Query the graph directly for precise facts

Question Pattern	Retriever
"What is...", "Tell me about..."	Vector
"Which entities are affected by..."	Vector Cypher
"How many...", "List all..."	Text2Cypher

Score	Interpretation
0.95-1.0	Near-exact match
0.90-0.95	Highly relevant
0.85-0.90	Relevant
0.80-0.85	Moderately relevant
< 0.80	Weak relevance

Question	Retriever	Why
"What is exhaust gas temperature?"	Vector	Semantic content
"Aircraft with components with critical events"	Vector Cypher	Content + entities
"How many events on AC1001?"	Text2Cypher	Precise count
"Tell me about the CFM56 engine"	Vector	Exploratory
"List AC1001 components"	Text2Cypher	Entity facts

Limitation	Impact
Hallucination	Can't trust answers without verification
Knowledge cutoff	Can't answer questions about your data
Relationship blindness	Can't reason across connected information

Score	Meaning
Near 1.0	Very similar
Near 0.5	Somewhat related
Near 0.0	Unrelated

This deck tells one arc. Traditional RAG retrieves similar text and stops there. GraphRAG preserves entities and relationships, which unlocks three retrieval patterns. We end on how those retrievers become agent tools. Background on why context matters, and how embeddings work, lives in the appendix for anyone who needs it.

Traditional RAG retrieves the chunks most similar to a question and hands them to the LLM. Embedding and vector-search fundamentals are in the appendix.

Traditional RAG works well for finding relevant passages by topic and answering questions inside a single document. It is the foundation of modern AI assistants. But as we will see, it struggles the moment information is connected across sources.

What traditional RAG sees: a chunk about aircraft AC1001 bearing wear, a chunk about flight FL00123 delayed at JFK, a chunk about EGT exceeding threshold on Engine 1. What it misses is everything connecting them. It can find text about bearing wear and text about delays, but it cannot tell you which flights were delayed because of a specific maintenance event. Each chunk is embedded and searched independently. There is no model of how the information connects.

A surprising finding. When RAG retrieves chunks that are similar but not truly relevant, the context window fills with tangentially related information and the model gets confused or misled. This became known as Context ROT, the retrieval of tangents. The retrieved context actively rots the quality of the answer.

Research from Chroma shows accuracy decreasing as irrelevant context grows. Adding more retrieved chunks often hurts rather than helps. The takeaway: quality of context matters more than quantity.

Each of these requires traversing or aggregating over relationships, not finding similar text. Similarity search cannot express them.

The core insight: documents have structure that traditional RAG ignores by treating them as a bag of words. GraphRAG extracts that structure into a knowledge graph that preserves entities, the relationships between them, and their properties. That shifts the question from "what is similar" to "what is connected and relevant".

Create the index once, before any vectors are stored. The sample uses the neo4j_graphrag library with Amazon Bedrock Titan v2 embeddings at 1024 dimensions. It is idempotent, so running it on every load is safe.

Each chunk gets a Titan v2 embedding written to the embedding property on its Chunk node. The chunkEmbeddings index updates automatically. You can verify with: MATCH (c:Chunk) RETURN c.text, size(c.embedding) LIMIT 1.

With all five in place, the graph has everything GraphRAG needs: searchable text, the structure around it, and the provenance behind it.

Vector or fulltext search finds relevant chunks, standard RAG. What GraphRAG adds is graph traversal from those chunks through the entities and relationships surrounding them. The agent ends up with far richer context than text search could provide.

Everything from here is built on the Neo4j Python GraphRAG library. It ships the three retriever patterns, pluggable embedders, and a pipeline that combines retrieval with generation in one call.

The GraphRAG class combines retrieval with generation. The retriever's only job is finding the right context. The LLM's only job is turning that context into a coherent answer.

This is the single most important slide. Everything that follows is one of these three retrievers. Vector for content, Vector Cypher for content plus relationships, Text2Cypher for precise facts. The combination is more powerful than any one alone.

Embed the query with Amazon Bedrock Titan, pass it into Neo4j as $queryEmbedding, and the index returns the five most semantically similar chunks. This is standard RAG retrieval, the entry point into the graph.

This is what traditional RAG cannot do. After vector search finds the starting chunks, we traverse the graph from them, here to the source document. The same pattern extends to components, events, and sensors.

The decision framework in three questions. Content or facts: content goes to Vector or Vector Cypher, facts go to Text2Cypher. Do you need related entities: no means Vector, yes means Vector Cypher. Is it about relationships: traversals go to Vector Cypher or Text2Cypher, pure semantics to Vector.

Mechanics only here. Embedding and similarity fundamentals are in the appendix. The key behavior: semantic match, not keyword match. "Engine problems" surfaces chunks about bearing wear and vibration exceedance.

Driver is the Neo4j connection, index_name is where embeddings live, embedder is the Amazon Bedrock Titan model that vectorizes the query. search returns the top_k most similar chunks, each with its text and a similarity score.

A practical scale for reading scores. Below 0.80 the match is usually too weak to trust as context.

Vector retriever returns text and nothing else. It cannot scope to a specific entity or aggregate. When you need related entities, move to Vector Cypher.

Two steps. Step one is the same vector search as before. Step two traverses the graph from each matched chunk to gather connected entities and relationships. You get semantic relevance and graph structure together.

You supply a retrieval_query that runs after vector search. The embedder is Amazon Bedrock Titan, consistent with the rest of the deck.

Your query receives node (the matched chunk) and score (similarity). From there you traverse to components and their events and return enriched results.

A plain MATCH silently drops components with no events. OPTIONAL MATCH keeps them with an empty events list, which is almost always what you want.

This is the key limitation. Traversal starts from the chunks vector search returns. If those chunks are generic, the traversal never reaches the specific entity. Entity-scoped questions belong to Text2Cypher.

Vector Cypher shines when the answer is content plus the structured entities connected to it.

No embeddings involved. The LLM, given the schema, translates the question directly into Cypher, runs it, and returns exact structured results.

get_schema introspects the graph. Passing it in is what keeps generated Cypher valid.

The schema is the contract. With it the LLM generates valid Cypher. Without it, it hallucinates properties and relationship types that do not exist.

Text2Cypher answers exact questions about what is in the graph. It cannot predict, and it cannot answer questions whose answer lives in unstructured chunk text. Match the retriever to the question.

Generated queries are still queries. Read-only credentials and query validation are the minimum safeguards before exposing this anywhere.

The canonical comparison. Read the question, pick the retriever.

In the fleet-agent sample, each retriever is one single-responsibility tool. The docstring is the routing logic the model reads. One tool per retriever, routing driven by the model.

The whole arc: traditional RAG's limits motivate GraphRAG; GraphRAG enables three retrieval patterns; each becomes a tool an agent selects. Vector finds the entry point, the graph adds the structure around it.

LLMs excel at pattern recognition and language fluency. These capabilities emerge from training on huge text corpora.

The model produces the most likely continuation, not a verified fact, and it does so confidently, complete with invented citations.

The model has no knowledge of your internal data or anything after its cutoff, yet it will still answer confidently.

These questions require connecting entities across documents and traversing chains of relationships, which sequential text processing cannot do.

Each limitation has a concrete failure mode. Building real systems means designing around all three.

Give the model relevant information in the prompt and all three problems shrink. RAG automates supplying that context instead of doing it by hand. This is the foundation of Retrieval-Augmented Generation.

The four-step pattern. The rest of the appendix unpacks embeddings and vector search, the machinery behind step three.

Embeddings are like a librarian who has read every book and organizes by meaning rather than by title or subject keywords.

A vector is just a list of numbers locating a point in space. In machine learning those numbers can encode the meaning of text.

Embeddings turn text into vectors where closeness equals similarity in meaning. That property is what makes semantic search possible.

Keyword search needs the exact term. Vector search matches meaning, so "engine problems" surfaces "bearing wear" and "overheat".

Conceptual scale. The main deck has a finer-grained practical band table for reading retriever scores in production.

GraphRAG: From Retrieval Limits to Graph-Enriched Search

The RAG Retrieval Flow

Traditional RAG: What It Enables

The Problem With Traditional RAG

Context ROT: More Context, Worse Answers

Context ROT: The Research

Questions Traditional RAG Can't Answer

The GraphRAG Solution

Create the Vector Index

Store Vectors in Neo4j

The Complete Knowledge Graph

GraphRAG: Graph-Enriched Retrieval

Powered by the Neo4j Python GraphRAG Library

The GraphRAG Class

Overview of Retrievers

Searching a Vector Index

Combining Vectors With Graph Traversal

Choosing the Right Retriever

Vector Retriever

What Is a Vector Retriever?

Creating and Searching

Understanding Similarity Scores

Vector Retriever: Best For and Limits

Vector Cypher Retriever

Beyond Basic Vector Search

Creating a Vector Cypher Retriever

Understanding the Retrieval Query

Why OPTIONAL MATCH Matters

The Chunk as Anchor

Vector Cypher: Best For

Text2Cypher Retriever

From Natural Language to Database Queries

Creating a Text2Cypher Retriever

The Role of Schema

Text2Cypher: Best For and Limits

Security Considerations

Comparing All Three Retrievers

Each Retriever Becomes an Agent Tool

Summary

Appendix: Foundations

What Generative AI Does Well

1. Hallucination: Confident But Wrong

2. Knowledge Cutoff: No Access to Your Data

3. Relationship Blindness: Can't Connect the Dots

Why These Limitations Matter

The Solution: Providing Context

How Traditional RAG Works

The Smart Librarian Analogy

What Is a Vector?

What Are Embeddings?

Without Vectors vs With Vectors

Similarity Search