Retrieving
RetriCo provides multiple retrieval strategies to query your knowledge graph. Each strategy approaches the graph differently — you can use them individually or combine them with fusion.
Overview
| Strategy | Description | Requires | Best for |
|---|---|---|---|
| Entity Lookup | Find entities by name, expand k-hop neighborhoods | Entity labels | Direct entity questions |
| Path-based | Shortest paths between parsed entities | Entity labels | Connection questions |
| Entity Embeddings | Vector similarity over KG-trained entity embeddings | Pre-built embeddings | Similar entity discovery |
| Chunk Embeddings | Semantic search over source text chunks | Pre-built embeddings | Free-text questions |
| Community Search | Vector search over community summaries | Pre-built communities | Broad topic questions |
| Tool-based | LLM agent with graph query tools | API key | Complex multi-hop questions |
| Keyword Search | BM25 full-text search over chunks | Chunk store | Exact term matching |
| Fusion | Combine multiple strategies | 2+ strategies | Best overall accuracy |
Creating a Query Pipeline
Like build pipelines, query pipelines support three creation methods.
Option 1: One-liner
import retrico
result = retrico.query_graph(
query="Where was Einstein born?",
entity_labels=["person", "location"],
api_key="sk-...",
)
print(result.answer)
Option 2: Builder API
builder = retrico.RetriCoSearch(name="my_query")
builder.query_parser(method="gliner", labels=["person", "location"])
builder.retriever(max_hops=2)
builder.chunk_retriever()
builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
executor = builder.build()
result = executor.run(query="Where was Einstein born?")
Option 3: YAML Config
name: query_pipeline
stores:
graph:
store_type: neo4j
uri: "bolt://localhost:7687"
nodes:
- id: parser
processor: query_parser
inputs:
query: {source: "$input", fields: "query"}
output: {key: "parser_result"}
config:
method: gliner
labels: [person, location]
- id: retriever
processor: retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "retriever_result"}
config:
max_hops: 2
- id: chunks
processor: chunk_retriever
requires: [retriever]
inputs:
subgraph: {source: "retriever_result", fields: "subgraph"}
output: {key: "chunk_result"}
- id: reasoner
processor: reasoner
requires: [chunks]
inputs:
query: {source: "$input", fields: "query"}
subgraph: {source: "chunk_result", fields: "subgraph"}
output: {key: "reasoner_result"}
config:
api_key: "sk-..."
model: "gpt-4o-mini"
executor = retrico.ProcessorFactory.create_pipeline("query_pipeline.yaml")
result = executor.run(query="Where was Einstein born?")
Entity Lookup
The default strategy. Parses the query for entities using NER, looks them up in the graph, and expands their neighborhoods by max_hops.
Builder API:
builder = retrico.RetriCoSearch(name="my_query")
builder.query_parser(method="gliner", labels=["person", "location"])
builder.retriever(max_hops=2)
builder.chunk_retriever()
builder.reasoner(api_key="sk-...", model="gpt-4o-mini") # optional
executor = builder.build()
result = executor.run(query="Where was Einstein born?")
One-liner:
result = retrico.query_graph(
query="Where was Einstein born?",
entity_labels=["person", "location"],
retrieval_strategy="entity",
)
YAML:
- id: retriever
processor: retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "retriever_result"}
config:
max_hops: 2
How it works:
query_parserextracts entities from the query (e.g. "Einstein" as a person)retrieverfinds matching entities in the graph, then expands their k-hop neighborhoodchunk_retrieverfetches source text chunks for retrieved entitiesreasoner(optional) generates a natural language answer from the subgraph
Parameters:
| Parameter | Default | Description |
|---|---|---|
max_hops | 2 | How many relationship hops to expand |
active_after | None | Only include relations active on or after this date (ISO 8601) |
active_before | None | Only include relations active on or before this date (ISO 8601) |
Temporal Filtering
Relations in RetriCo can carry start_date and end_date properties (set during data ingest or via the graph store API). Use active_after and active_before to filter relations by time range:
builder = retrico.RetriCoSearch(name="temporal_query")
builder.query_parser(method="gliner", labels=["person", "organization"])
builder.retriever(
max_hops=2,
active_after="2020-01-01",
active_before="2020-12-31",
)
builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
- id: retriever
processor: retriever
config:
max_hops: 2
active_after: "2020-01-01"
active_before: "2020-12-31"
The filtering logic:
active_after— keeps relations whereend_date IS NULL OR end_date >= active_afteractive_before— keeps relations wherestart_date IS NULL OR start_date <= active_before- Relations without dates are always included (treated as always active)
Temporal filtering also works with path_retriever and entity_embedding_retriever.
Entity Lookup with Linking
Same as entity lookup, but links parsed entities to a knowledge base first for precise lookup by stable ID.
Builder API:
builder = retrico.RetriCoSearch(name="linked_query")
builder.query_parser(labels=["person", "location"])
builder.linker(executor=glinker_executor) # or neo4j_uri= to load KB from graph
builder.retriever(max_hops=2)
builder.chunk_retriever()
executor = builder.build()
Path-based Retrieval
Finds the shortest paths between entities parsed from the query. Useful when the answer lies in the connections between entities rather than individual neighborhoods.
Builder API:
builder = retrico.RetriCoSearch(name="path_query")
builder.query_parser(method="gliner", labels=["person", "location"])
builder.path_retriever()
builder.chunk_retriever()
executor = builder.build()
One-liner:
result = retrico.query_graph(
query="How is Einstein connected to the University of Paris?",
entity_labels=["person", "organization"],
retrieval_strategy="path",
)
YAML:
- id: retriever
processor: path_retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "path_retriever_result"}
Entity Embeddings
Uses vector similarity to find entities whose KG embeddings are closest to the query entities. Requires pre-built entity embeddings (see Modeling).
Builder API:
builder = retrico.RetriCoSearch(name="embedding_query")
builder.query_parser(method="gliner", labels=["person", "location"])
builder.entity_embedding_retriever(
top_k=5,
max_hops=2,
vector_index_name="entity_embeddings",
embedding_method="sentence_transformer",
model_name="all-MiniLM-L6-v2",
)
builder.chunk_retriever()
executor = builder.build()
One-liner:
result = retrico.query_graph(
query="Who works at similar institutions to Einstein?",
entity_labels=["person", "organization"],
retrieval_strategy="entity_embedding",
retriever_kwargs={"top_k": 10},
)
YAML:
- id: retriever
processor: entity_embedding_retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "entity_embedding_retriever_result"}
config:
top_k: 5
max_hops: 2
vector_index_name: entity_embeddings
embedding_method: sentence_transformer
model_name: "all-MiniLM-L6-v2"
Parameters:
| Parameter | Default | Description |
|---|---|---|
top_k | 5 | Number of similar entities to retrieve |
max_hops | 2 | Expand neighborhoods around matched entities |
vector_index_name | (required) | Name of the vector index in the store |
embedding_method | "sentence_transformer" | "sentence_transformer" or "openai" |
model_name | "all-MiniLM-L6-v2" | Embedding model name |
Chunk Embeddings
Semantic search over source text chunks. Bypasses the graph structure entirely — finds chunks whose embeddings are most similar to the query. Requires pre-built chunk embeddings.
Builder API:
builder = retrico.RetriCoSearch(name="chunk_query")
builder.chunk_embedding_retriever(
top_k=5,
max_hops=1,
vector_index_name="chunk_embeddings",
)
executor = builder.build()
One-liner:
result = retrico.query_graph(
query="What is the theory of relativity?",
retrieval_strategy="chunk_embedding",
retriever_kwargs={"top_k": 10},
)
YAML:
- id: retriever
processor: chunk_embedding_retriever
inputs:
query: {source: "$input", fields: "query"}
output: {key: "chunk_embedding_retriever_result"}
config:
top_k: 5
max_hops: 1
vector_index_name: chunk_embeddings
Parameters:
| Parameter | Default | Description |
|---|---|---|
top_k | 5 | Number of chunks to retrieve |
max_hops | 1 | Expand entity neighborhoods from matched chunks |
vector_index_name | (required) | Name of the vector index |
Community Search
Vector search over community summaries. Requires pre-built communities with summaries and embeddings (see Modeling - Community Detection).
Builder API:
builder = retrico.RetriCoSearch(name="community_query")
builder.community_retriever(
top_k=3,
max_hops=1,
vector_index_name="community_embeddings",
)
builder.chunk_retriever()
executor = builder.build()
One-liner:
result = retrico.query_graph(
query="What research fields are represented in the graph?",
retrieval_strategy="community",
api_key="sk-...",
)
YAML:
- id: retriever
processor: community_retriever
inputs:
query: {source: "$input", fields: "query"}
output: {key: "community_retriever_result"}
config:
top_k: 3
max_hops: 1
vector_index_name: community_embeddings
Parameters:
| Parameter | Default | Description |
|---|---|---|
top_k | 3 | Number of communities to retrieve |
max_hops | 1 | Expand entity neighborhoods within matched communities |
Tool-based Retrieval
An LLM agent that iteratively queries the graph using tools (entity lookup, relation search, path finding). The agent decides which tools to call based on the query.
Builder API:
builder = retrico.RetriCoSearch(name="tool_query")
builder.tool_retriever(
api_key="sk-...",
model="gpt-4o-mini",
max_tool_rounds=3,
entity_types=["person", "organization"],
relation_types=["WORKS_AT", "BORN_IN"],
)
builder.chunk_retriever()
executor = builder.build()
One-liner:
result = retrico.query_graph(
query="What organizations did Einstein work at?",
retrieval_strategy="tool",
api_key="sk-...",
model="gpt-4o-mini",
)
YAML:
- id: retriever
processor: tool_retriever
inputs:
query: {source: "$input", fields: "query"}
output: {key: "tool_retriever_result"}
config:
api_key: "sk-..."
model: "gpt-4o-mini"
max_tool_rounds: 3
entity_types: [person, organization]
relation_types: [WORKS_AT, BORN_IN]
Parameters:
| Parameter | Default | Description |
|---|---|---|
api_key | (required) | OpenAI-compatible API key |
model | "gpt-4o-mini" | LLM model name |
max_tool_rounds | 3 | Maximum iterations for the agent loop |
entity_types | [] | Hint about available entity types |
relation_types | [] | Hint about available relation types |
chunk_source | "entity" | How to resolve chunks: "entity" or "relation" |
The tool retriever also supports temporal filtering — the LLM agent can pass start_date and end_date arguments to tools like get_entity_relations dynamically based on the query context.
Keyword Search
Full-text search over chunks. Supports two search backends:
- Relational (
search_source="relational", default) — searches SQLite FTS5, PostgreSQL tsvector, or Elasticsearch - Graph (
search_source="graph") — uses the graph DB's native full-text index (Neo4j Lucene, FalkorDB FTS, Memgraph Tantivy)
Two entity modes:
- Chunks-only (default for relational) — returns matched chunks directly
- Entity expansion (
expand_entities=True, default for graph) — additionally looks up entities mentioned in matched chunks
Relational source (chunks-only):
builder = retrico.RetriCoSearch(name="keyword_query")
builder.keyword_retriever(
top_k=10,
relational_store_type="sqlite",
sqlite_path="chunks.db",
)
builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
executor = builder.build()
Relational source (with entity expansion):
builder = retrico.RetriCoSearch(name="keyword_expanded")
builder.keyword_retriever(
top_k=10,
expand_entities=True,
max_hops=1,
relational_store_type="sqlite",
sqlite_path="chunks.db",
)
builder.chunk_retriever()
executor = builder.build()
Graph DB source (native FTS):
builder = retrico.RetriCoSearch(name="graph_keyword_query")
builder.keyword_retriever(
search_source="graph",
top_k=10,
)
builder.chunk_retriever()
executor = builder.build()
One-liner:
result = retrico.query_graph(
query="theory of relativity",
retrieval_strategy="keyword",
)
YAML:
- id: retriever
processor: keyword_retriever
inputs:
query: {source: "$input", fields: "query"}
output: {key: "keyword_retriever_result"}
config:
top_k: 10
search_source: graph # or "relational"
| Graph DB | FTS engine | Index creation |
|---|---|---|
| Neo4j | Lucene | CREATE FULLTEXT INDEX ... (automatic) |
| FalkorDB | Built-in | CALL db.idx.fulltext.createNodeIndex(...) (automatic) |
| Memgraph | Tantivy | CREATE TEXT INDEX ... (automatic) |
KG-Scored Retrieval
Uses an LLM tool-calling parser to decompose the query into structured triple patterns, then resolves those against the graph store and scores them with trained KG embeddings. The KG scorer acts as a universal retriever.
Prerequisites: A trained KG embedding model (optional but recommended). Use retrico.train_kg_model() to train one.
Builder API:
builder = retrico.RetriCoSearch(name="kg_scored_query")
builder.query_parser(
method="tool",
api_key="sk-...",
model="gpt-4o-mini",
labels=["person", "location"],
relation_labels=["born_in", "works_at"],
)
builder.kg_scorer(
model_path="kg_model",
top_k=10,
predict_tails=True,
score_threshold=0.5,
device="cpu",
)
builder.chunk_retriever(chunk_entity_source="both")
builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
executor = builder.build()
ctx = executor.run({"query": "Where was Einstein born?"})
One-liner:
result = retrico.query_graph(
query="Where was Einstein born?",
api_key="sk-...",
model="gpt-4o-mini",
retrieval_strategy="kg_scored",
entity_labels=["person", "location"],
retriever_kwargs={
"relation_labels": ["born_in", "works_at"],
"model_path": "kg_model",
"top_k": 10,
"predict_tails": True,
},
)
How it works:
- Tool-calling parser decomposes the query into
search_triples(head, relation, tail)calls - KG scorer looks up entities in the graph store
- Scores candidate triples with KGE model (if available)
- Builds a Subgraph from scored results
- Optionally predicts missing links
Parameters:
| Parameter | Default | Description |
|---|---|---|
model_path | (required) | Trained KGE model directory |
top_k | 10 | Top predictions per entity |
predict_tails | True | Predict (entity, relation, ?) |
predict_heads | False | Predict (?, relation, entity) |
score_threshold | None | Minimum score filter |
device | "cpu" | "cpu" or "cuda" |
Strategy Comparison
| Strategy | Needs parser? | Needs embeddings? | Needs LLM? | Best for |
|---|---|---|---|---|
| entity (default) | yes | no | no | Direct entity lookup |
| entity + linking | yes | no | no | Precise lookup with KB IDs |
| community | no | yes (community) | no | Topic/cluster-based queries |
| chunk_embedding | no | yes (chunk) | no | Semantic similarity search |
| entity_embedding | yes | yes (entity) | no | Finding similar entities |
| tool | no | no | yes | Complex multi-hop questions |
| path | yes | no | no | Relationship discovery |
| kg_scored | yes (tool) | optional (KGE) | yes | Structured triple matching + link prediction |
| keyword | no | no | no | Full-text search (relational or graph DB) |
Fusion: Combining Strategies
When a single strategy isn't enough, combine multiple strategies and fuse their results.
Builder API
builder = retrico.RetriCoSearch(name="fused_query")
builder.query_parser(method="gliner", labels=["person", "location"])
builder.retriever(max_hops=2) # entity lookup
builder.path_retriever() # path-based
builder.community_retriever() # community search
builder.fusion(strategy="rrf", top_k=20) # merge results
builder.chunk_retriever()
builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
executor = builder.build()
One-liner
result = retrico.query_graph(
query="Where was Einstein born?",
entity_labels=["person", "location"],
retrieval_strategy=["entity", "community", "path"], # list triggers fusion
fusion_strategy="rrf",
api_key="sk-...",
)
YAML
name: fused_query
nodes:
- id: parser
processor: query_parser
inputs:
query: {source: "$input", fields: "query"}
output: {key: "parser_result"}
config:
method: gliner
labels: [person, location]
- id: retriever_0
processor: retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "retriever_0_result"}
config:
max_hops: 2
- id: retriever_1
processor: path_retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "retriever_1_result"}
- id: retriever_2
processor: community_retriever
inputs:
query: {source: "$input", fields: "query"}
output: {key: "retriever_2_result"}
config:
top_k: 3
- id: fusion
processor: fusion
requires: [retriever_0, retriever_1, retriever_2]
inputs:
subgraph_0: {source: "retriever_0_result", fields: "subgraph"}
subgraph_1: {source: "retriever_1_result", fields: "subgraph"}
subgraph_2: {source: "retriever_2_result", fields: "subgraph"}
output: {key: "fusion_result"}
config:
strategy: rrf
top_k: 20
- id: chunks
processor: chunk_retriever
requires: [fusion]
inputs:
subgraph: {source: "fusion_result", fields: "subgraph"}
output: {key: "chunk_result"}
- id: reasoner
processor: reasoner
requires: [chunks]
inputs:
query: {source: "$input", fields: "query"}
subgraph: {source: "chunk_result", fields: "subgraph"}
output: {key: "reasoner_result"}
config:
api_key: "sk-..."
model: "gpt-4o-mini"
Fusion Strategies
| Strategy | Behavior |
|---|---|
union | Combine all entities and relations, deduplicate by ID |
rrf | Reciprocal Rank Fusion — ranks entities across retrievers |
weighted | Weight each retriever's entities by configurable weight |
intersection | Only keep entities found in multiple retrievers |
Parameters:
| Parameter | Default | Description |
|---|---|---|
strategy | "union" | Fusion method |
top_k | 0 | Max entities after fusion (0 = keep all) |
weights | [] | Per-retriever weights (for "weighted") |
min_sources | 2 | Min retrievers an entity must appear in (for "intersection") |
RetriCoFusedSearch (Recommended for Complex Fusion)
Configure each retrieval strategy as a separate RetriCoSearch, then combine via RetriCoFusedSearch:
from retrico import RetriCoSearch, RetriCoFusedSearch, Neo4jConfig
store = Neo4jConfig(uri="bolt://localhost:7687", password="password")
# Strategy 1: Entity lookup
entity_builder = RetriCoSearch(name="entity")
entity_builder.store(store)
entity_builder.query_parser(labels=["person", "organization", "location"])
entity_builder.retriever(max_hops=3)
# Strategy 2: Shortest paths
path_builder = RetriCoSearch(name="path")
path_builder.store(store)
path_builder.query_parser(labels=["person", "organization", "location"])
path_builder.path_retriever(max_path_length=5, max_pairs=10)
# Strategy 3: Community search
community_builder = RetriCoSearch(name="community")
community_builder.store(store)
community_builder.community_retriever(top_k=3)
# Combine with RRF fusion
fused = RetriCoFusedSearch(
entity_builder, path_builder, community_builder,
strategy="rrf",
top_k=25,
)
fused.chunk_retriever()
fused.reasoner(api_key="sk-...", model="gpt-4o-mini")
executor = fused.build()
ctx = executor.run({"query": "What is the relationship between Einstein and quantum mechanics?"})
The parser is auto-inherited from the first sub-builder that has one. Store config is also inherited.
Simpler example — two strategies:
entity = RetriCoSearch(name="entity")
entity.store(store)
entity.query_parser(labels=["person", "location"])
entity.retriever(max_hops=2)
community = RetriCoSearch(name="community")
community.store(store)
community.community_retriever(top_k=5)
fused = RetriCoFusedSearch(
entity, community,
strategy="weighted",
weights=[2.0, 1.0],
top_k=15,
)
fused.chunk_retriever()
executor = fused.build()
Query Parser
The query parser extracts entities from the natural language query. It supports three methods:
Parameters:
| Parameter | Default | Description |
|---|---|---|
method | "gliner" | Parsing method: "gliner", "llm", or "tool" |
labels | (required for gliner/llm) | Entity types to extract |
model | varies | GLiNER model or LLM model name |
api_key | None | Required for "llm" and "tool" methods |
# GLiNER (local, fast)
builder.query_parser(method="gliner", labels=["person", "location"])
# LLM (API-based)
builder.query_parser(method="llm", labels=["person", "location"], api_key="sk-...")
# Tool-calling (LLM decides what entities to search for)
builder.query_parser(method="tool", api_key="sk-...", model="gpt-4o-mini")
Adding a Reasoner
Any retrieval strategy can be paired with an LLM reasoner that generates a natural language answer from the retrieved subgraph:
Builder API:
builder.reasoner(
api_key="sk-...",
model="gpt-4o-mini",
)
YAML:
- id: reasoner
processor: reasoner
requires: [chunks]
inputs:
query: {source: "$input", fields: "query"}
subgraph: {source: "chunk_result", fields: "subgraph"}
output: {key: "reasoner_result"}
config:
api_key: "sk-..."
model: "gpt-4o-mini"
Parameters:
| Parameter | Default | Description |
|---|---|---|
api_key | (required) | OpenAI-compatible API key |
model | "gpt-4o-mini" | LLM model name |
temperature | 0.1 | Sampling temperature |
base_url | None | Custom API endpoint |
Without a reasoner, you still get the full retrieved subgraph:
result = executor.run(query="Where was Einstein born?")
subgraph = result.get("chunk_result")["subgraph"]
print(subgraph.entities)
print(subgraph.relations)
print(subgraph.chunks)