Skip to main content

Retrieving

RetriCo provides multiple retrieval strategies to query your knowledge graph. Each strategy approaches the graph differently — you can use them individually or combine them with fusion.

Overview

StrategyDescriptionRequiresBest for
Entity LookupFind entities by name, expand k-hop neighborhoodsEntity labelsDirect entity questions
Path-basedShortest paths between parsed entitiesEntity labelsConnection questions
Entity EmbeddingsVector similarity over KG-trained entity embeddingsPre-built embeddingsSimilar entity discovery
Chunk EmbeddingsSemantic search over source text chunksPre-built embeddingsFree-text questions
Community SearchVector search over community summariesPre-built communitiesBroad topic questions
Tool-basedLLM agent with graph query toolsAPI keyComplex multi-hop questions
Keyword SearchBM25 full-text search over chunksChunk storeExact term matching
FusionCombine multiple strategies2+ strategiesBest overall accuracy

Creating a Query Pipeline

Like build pipelines, query pipelines support three creation methods.

Option 1: One-liner

import retrico

result = retrico.query_graph(
query="Where was Einstein born?",
entity_labels=["person", "location"],
api_key="sk-...",
)
print(result.answer)

Option 2: Builder API

builder = retrico.RetriCoSearch(name="my_query")
builder.query_parser(method="gliner", labels=["person", "location"])
builder.retriever(max_hops=2)
builder.chunk_retriever()
builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
executor = builder.build()

result = executor.run(query="Where was Einstein born?")

Option 3: YAML Config

name: query_pipeline
stores:
graph:
store_type: neo4j
uri: "bolt://localhost:7687"

nodes:
- id: parser
processor: query_parser
inputs:
query: {source: "$input", fields: "query"}
output: {key: "parser_result"}
config:
method: gliner
labels: [person, location]

- id: retriever
processor: retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "retriever_result"}
config:
max_hops: 2

- id: chunks
processor: chunk_retriever
requires: [retriever]
inputs:
subgraph: {source: "retriever_result", fields: "subgraph"}
output: {key: "chunk_result"}

- id: reasoner
processor: reasoner
requires: [chunks]
inputs:
query: {source: "$input", fields: "query"}
subgraph: {source: "chunk_result", fields: "subgraph"}
output: {key: "reasoner_result"}
config:
api_key: "sk-..."
model: "gpt-4o-mini"
executor = retrico.ProcessorFactory.create_pipeline("query_pipeline.yaml")
result = executor.run(query="Where was Einstein born?")

Entity Lookup

The default strategy. Parses the query for entities using NER, looks them up in the graph, and expands their neighborhoods by max_hops.

Entity Lookup Retrieval

Builder API:

builder = retrico.RetriCoSearch(name="my_query")
builder.query_parser(method="gliner", labels=["person", "location"])
builder.retriever(max_hops=2)
builder.chunk_retriever()
builder.reasoner(api_key="sk-...", model="gpt-4o-mini") # optional
executor = builder.build()

result = executor.run(query="Where was Einstein born?")

One-liner:

result = retrico.query_graph(
query="Where was Einstein born?",
entity_labels=["person", "location"],
retrieval_strategy="entity",
)

YAML:

- id: retriever
processor: retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "retriever_result"}
config:
max_hops: 2

How it works:

  1. query_parser extracts entities from the query (e.g. "Einstein" as a person)
  2. retriever finds matching entities in the graph, then expands their k-hop neighborhood
  3. chunk_retriever fetches source text chunks for retrieved entities
  4. reasoner (optional) generates a natural language answer from the subgraph

Parameters:

ParameterDefaultDescription
max_hops2How many relationship hops to expand
active_afterNoneOnly include relations active on or after this date (ISO 8601)
active_beforeNoneOnly include relations active on or before this date (ISO 8601)

Temporal Filtering

Relations in RetriCo can carry start_date and end_date properties (set during data ingest or via the graph store API). Use active_after and active_before to filter relations by time range:

builder = retrico.RetriCoSearch(name="temporal_query")
builder.query_parser(method="gliner", labels=["person", "organization"])
builder.retriever(
max_hops=2,
active_after="2020-01-01",
active_before="2020-12-31",
)
builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
- id: retriever
processor: retriever
config:
max_hops: 2
active_after: "2020-01-01"
active_before: "2020-12-31"

The filtering logic:

  • active_after — keeps relations where end_date IS NULL OR end_date >= active_after
  • active_before — keeps relations where start_date IS NULL OR start_date <= active_before
  • Relations without dates are always included (treated as always active)

Temporal filtering also works with path_retriever and entity_embedding_retriever.


Entity Lookup with Linking

Same as entity lookup, but links parsed entities to a knowledge base first for precise lookup by stable ID.

Builder API:

builder = retrico.RetriCoSearch(name="linked_query")
builder.query_parser(labels=["person", "location"])
builder.linker(executor=glinker_executor) # or neo4j_uri= to load KB from graph
builder.retriever(max_hops=2)
builder.chunk_retriever()
executor = builder.build()

Path-based Retrieval

Finds the shortest paths between entities parsed from the query. Useful when the answer lies in the connections between entities rather than individual neighborhoods.

Path-based Retrieval

Builder API:

builder = retrico.RetriCoSearch(name="path_query")
builder.query_parser(method="gliner", labels=["person", "location"])
builder.path_retriever()
builder.chunk_retriever()
executor = builder.build()

One-liner:

result = retrico.query_graph(
query="How is Einstein connected to the University of Paris?",
entity_labels=["person", "organization"],
retrieval_strategy="path",
)

YAML:

- id: retriever
processor: path_retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "path_retriever_result"}

Entity Embeddings

Uses vector similarity to find entities whose KG embeddings are closest to the query entities. Requires pre-built entity embeddings (see Modeling).

Entity Embedding Retrieval

Builder API:

builder = retrico.RetriCoSearch(name="embedding_query")
builder.query_parser(method="gliner", labels=["person", "location"])
builder.entity_embedding_retriever(
top_k=5,
max_hops=2,
vector_index_name="entity_embeddings",
embedding_method="sentence_transformer",
model_name="all-MiniLM-L6-v2",
)
builder.chunk_retriever()
executor = builder.build()

One-liner:

result = retrico.query_graph(
query="Who works at similar institutions to Einstein?",
entity_labels=["person", "organization"],
retrieval_strategy="entity_embedding",
retriever_kwargs={"top_k": 10},
)

YAML:

- id: retriever
processor: entity_embedding_retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "entity_embedding_retriever_result"}
config:
top_k: 5
max_hops: 2
vector_index_name: entity_embeddings
embedding_method: sentence_transformer
model_name: "all-MiniLM-L6-v2"

Parameters:

ParameterDefaultDescription
top_k5Number of similar entities to retrieve
max_hops2Expand neighborhoods around matched entities
vector_index_name(required)Name of the vector index in the store
embedding_method"sentence_transformer""sentence_transformer" or "openai"
model_name"all-MiniLM-L6-v2"Embedding model name

Chunk Embeddings

Semantic search over source text chunks. Bypasses the graph structure entirely — finds chunks whose embeddings are most similar to the query. Requires pre-built chunk embeddings.

Chunk Embedding Retrieval

Builder API:

builder = retrico.RetriCoSearch(name="chunk_query")
builder.chunk_embedding_retriever(
top_k=5,
max_hops=1,
vector_index_name="chunk_embeddings",
)
executor = builder.build()

One-liner:

result = retrico.query_graph(
query="What is the theory of relativity?",
retrieval_strategy="chunk_embedding",
retriever_kwargs={"top_k": 10},
)

YAML:

- id: retriever
processor: chunk_embedding_retriever
inputs:
query: {source: "$input", fields: "query"}
output: {key: "chunk_embedding_retriever_result"}
config:
top_k: 5
max_hops: 1
vector_index_name: chunk_embeddings

Parameters:

ParameterDefaultDescription
top_k5Number of chunks to retrieve
max_hops1Expand entity neighborhoods from matched chunks
vector_index_name(required)Name of the vector index

Vector search over community summaries. Requires pre-built communities with summaries and embeddings (see Modeling - Community Detection).

Community Search Retrieval

Builder API:

builder = retrico.RetriCoSearch(name="community_query")
builder.community_retriever(
top_k=3,
max_hops=1,
vector_index_name="community_embeddings",
)
builder.chunk_retriever()
executor = builder.build()

One-liner:

result = retrico.query_graph(
query="What research fields are represented in the graph?",
retrieval_strategy="community",
api_key="sk-...",
)

YAML:

- id: retriever
processor: community_retriever
inputs:
query: {source: "$input", fields: "query"}
output: {key: "community_retriever_result"}
config:
top_k: 3
max_hops: 1
vector_index_name: community_embeddings

Parameters:

ParameterDefaultDescription
top_k3Number of communities to retrieve
max_hops1Expand entity neighborhoods within matched communities

Tool-based Retrieval

An LLM agent that iteratively queries the graph using tools (entity lookup, relation search, path finding). The agent decides which tools to call based on the query.

Tool-based Retrieval

Builder API:

builder = retrico.RetriCoSearch(name="tool_query")
builder.tool_retriever(
api_key="sk-...",
model="gpt-4o-mini",
max_tool_rounds=3,
entity_types=["person", "organization"],
relation_types=["WORKS_AT", "BORN_IN"],
)
builder.chunk_retriever()
executor = builder.build()

One-liner:

result = retrico.query_graph(
query="What organizations did Einstein work at?",
retrieval_strategy="tool",
api_key="sk-...",
model="gpt-4o-mini",
)

YAML:

- id: retriever
processor: tool_retriever
inputs:
query: {source: "$input", fields: "query"}
output: {key: "tool_retriever_result"}
config:
api_key: "sk-..."
model: "gpt-4o-mini"
max_tool_rounds: 3
entity_types: [person, organization]
relation_types: [WORKS_AT, BORN_IN]

Parameters:

ParameterDefaultDescription
api_key(required)OpenAI-compatible API key
model"gpt-4o-mini"LLM model name
max_tool_rounds3Maximum iterations for the agent loop
entity_types[]Hint about available entity types
relation_types[]Hint about available relation types
chunk_source"entity"How to resolve chunks: "entity" or "relation"

The tool retriever also supports temporal filtering — the LLM agent can pass start_date and end_date arguments to tools like get_entity_relations dynamically based on the query context.


Full-text search over chunks. Supports two search backends:

  • Relational (search_source="relational", default) — searches SQLite FTS5, PostgreSQL tsvector, or Elasticsearch
  • Graph (search_source="graph") — uses the graph DB's native full-text index (Neo4j Lucene, FalkorDB FTS, Memgraph Tantivy)

Two entity modes:

  • Chunks-only (default for relational) — returns matched chunks directly
  • Entity expansion (expand_entities=True, default for graph) — additionally looks up entities mentioned in matched chunks

Keyword Search Retrieval

Relational source (chunks-only):

builder = retrico.RetriCoSearch(name="keyword_query")
builder.keyword_retriever(
top_k=10,
relational_store_type="sqlite",
sqlite_path="chunks.db",
)
builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
executor = builder.build()

Relational source (with entity expansion):

builder = retrico.RetriCoSearch(name="keyword_expanded")
builder.keyword_retriever(
top_k=10,
expand_entities=True,
max_hops=1,
relational_store_type="sqlite",
sqlite_path="chunks.db",
)
builder.chunk_retriever()
executor = builder.build()

Graph DB source (native FTS):

builder = retrico.RetriCoSearch(name="graph_keyword_query")
builder.keyword_retriever(
search_source="graph",
top_k=10,
)
builder.chunk_retriever()
executor = builder.build()

One-liner:

result = retrico.query_graph(
query="theory of relativity",
retrieval_strategy="keyword",
)

YAML:

- id: retriever
processor: keyword_retriever
inputs:
query: {source: "$input", fields: "query"}
output: {key: "keyword_retriever_result"}
config:
top_k: 10
search_source: graph # or "relational"
Graph DBFTS engineIndex creation
Neo4jLuceneCREATE FULLTEXT INDEX ... (automatic)
FalkorDBBuilt-inCALL db.idx.fulltext.createNodeIndex(...) (automatic)
MemgraphTantivyCREATE TEXT INDEX ... (automatic)

KG-Scored Retrieval

Uses an LLM tool-calling parser to decompose the query into structured triple patterns, then resolves those against the graph store and scores them with trained KG embeddings. The KG scorer acts as a universal retriever.

Prerequisites: A trained KG embedding model (optional but recommended). Use retrico.train_kg_model() to train one.

Builder API:

builder = retrico.RetriCoSearch(name="kg_scored_query")
builder.query_parser(
method="tool",
api_key="sk-...",
model="gpt-4o-mini",
labels=["person", "location"],
relation_labels=["born_in", "works_at"],
)
builder.kg_scorer(
model_path="kg_model",
top_k=10,
predict_tails=True,
score_threshold=0.5,
device="cpu",
)
builder.chunk_retriever(chunk_entity_source="both")
builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
executor = builder.build()
ctx = executor.run({"query": "Where was Einstein born?"})

One-liner:

result = retrico.query_graph(
query="Where was Einstein born?",
api_key="sk-...",
model="gpt-4o-mini",
retrieval_strategy="kg_scored",
entity_labels=["person", "location"],
retriever_kwargs={
"relation_labels": ["born_in", "works_at"],
"model_path": "kg_model",
"top_k": 10,
"predict_tails": True,
},
)

How it works:

  1. Tool-calling parser decomposes the query into search_triples(head, relation, tail) calls
  2. KG scorer looks up entities in the graph store
  3. Scores candidate triples with KGE model (if available)
  4. Builds a Subgraph from scored results
  5. Optionally predicts missing links

Parameters:

ParameterDefaultDescription
model_path(required)Trained KGE model directory
top_k10Top predictions per entity
predict_tailsTruePredict (entity, relation, ?)
predict_headsFalsePredict (?, relation, entity)
score_thresholdNoneMinimum score filter
device"cpu""cpu" or "cuda"

Strategy Comparison

StrategyNeeds parser?Needs embeddings?Needs LLM?Best for
entity (default)yesnonoDirect entity lookup
entity + linkingyesnonoPrecise lookup with KB IDs
communitynoyes (community)noTopic/cluster-based queries
chunk_embeddingnoyes (chunk)noSemantic similarity search
entity_embeddingyesyes (entity)noFinding similar entities
toolnonoyesComplex multi-hop questions
pathyesnonoRelationship discovery
kg_scoredyes (tool)optional (KGE)yesStructured triple matching + link prediction
keywordnononoFull-text search (relational or graph DB)

Fusion: Combining Strategies

When a single strategy isn't enough, combine multiple strategies and fuse their results.

Fusion — Multi-Retriever Merging

Builder API

builder = retrico.RetriCoSearch(name="fused_query")
builder.query_parser(method="gliner", labels=["person", "location"])
builder.retriever(max_hops=2) # entity lookup
builder.path_retriever() # path-based
builder.community_retriever() # community search
builder.fusion(strategy="rrf", top_k=20) # merge results
builder.chunk_retriever()
builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
executor = builder.build()

One-liner

result = retrico.query_graph(
query="Where was Einstein born?",
entity_labels=["person", "location"],
retrieval_strategy=["entity", "community", "path"], # list triggers fusion
fusion_strategy="rrf",
api_key="sk-...",
)

YAML

name: fused_query
nodes:
- id: parser
processor: query_parser
inputs:
query: {source: "$input", fields: "query"}
output: {key: "parser_result"}
config:
method: gliner
labels: [person, location]

- id: retriever_0
processor: retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "retriever_0_result"}
config:
max_hops: 2

- id: retriever_1
processor: path_retriever
requires: [parser]
inputs:
entities: {source: "parser_result", fields: "entities"}
output: {key: "retriever_1_result"}

- id: retriever_2
processor: community_retriever
inputs:
query: {source: "$input", fields: "query"}
output: {key: "retriever_2_result"}
config:
top_k: 3

- id: fusion
processor: fusion
requires: [retriever_0, retriever_1, retriever_2]
inputs:
subgraph_0: {source: "retriever_0_result", fields: "subgraph"}
subgraph_1: {source: "retriever_1_result", fields: "subgraph"}
subgraph_2: {source: "retriever_2_result", fields: "subgraph"}
output: {key: "fusion_result"}
config:
strategy: rrf
top_k: 20

- id: chunks
processor: chunk_retriever
requires: [fusion]
inputs:
subgraph: {source: "fusion_result", fields: "subgraph"}
output: {key: "chunk_result"}

- id: reasoner
processor: reasoner
requires: [chunks]
inputs:
query: {source: "$input", fields: "query"}
subgraph: {source: "chunk_result", fields: "subgraph"}
output: {key: "reasoner_result"}
config:
api_key: "sk-..."
model: "gpt-4o-mini"

Fusion Strategies

StrategyBehavior
unionCombine all entities and relations, deduplicate by ID
rrfReciprocal Rank Fusion — ranks entities across retrievers
weightedWeight each retriever's entities by configurable weight
intersectionOnly keep entities found in multiple retrievers

Parameters:

ParameterDefaultDescription
strategy"union"Fusion method
top_k0Max entities after fusion (0 = keep all)
weights[]Per-retriever weights (for "weighted")
min_sources2Min retrievers an entity must appear in (for "intersection")

Configure each retrieval strategy as a separate RetriCoSearch, then combine via RetriCoFusedSearch:

from retrico import RetriCoSearch, RetriCoFusedSearch, Neo4jConfig

store = Neo4jConfig(uri="bolt://localhost:7687", password="password")

# Strategy 1: Entity lookup
entity_builder = RetriCoSearch(name="entity")
entity_builder.store(store)
entity_builder.query_parser(labels=["person", "organization", "location"])
entity_builder.retriever(max_hops=3)

# Strategy 2: Shortest paths
path_builder = RetriCoSearch(name="path")
path_builder.store(store)
path_builder.query_parser(labels=["person", "organization", "location"])
path_builder.path_retriever(max_path_length=5, max_pairs=10)

# Strategy 3: Community search
community_builder = RetriCoSearch(name="community")
community_builder.store(store)
community_builder.community_retriever(top_k=3)

# Combine with RRF fusion
fused = RetriCoFusedSearch(
entity_builder, path_builder, community_builder,
strategy="rrf",
top_k=25,
)
fused.chunk_retriever()
fused.reasoner(api_key="sk-...", model="gpt-4o-mini")

executor = fused.build()
ctx = executor.run({"query": "What is the relationship between Einstein and quantum mechanics?"})

The parser is auto-inherited from the first sub-builder that has one. Store config is also inherited.

Simpler example — two strategies:

entity = RetriCoSearch(name="entity")
entity.store(store)
entity.query_parser(labels=["person", "location"])
entity.retriever(max_hops=2)

community = RetriCoSearch(name="community")
community.store(store)
community.community_retriever(top_k=5)

fused = RetriCoFusedSearch(
entity, community,
strategy="weighted",
weights=[2.0, 1.0],
top_k=15,
)
fused.chunk_retriever()
executor = fused.build()

Query Parser

The query parser extracts entities from the natural language query. It supports three methods:

Parameters:

ParameterDefaultDescription
method"gliner"Parsing method: "gliner", "llm", or "tool"
labels(required for gliner/llm)Entity types to extract
modelvariesGLiNER model or LLM model name
api_keyNoneRequired for "llm" and "tool" methods
# GLiNER (local, fast)
builder.query_parser(method="gliner", labels=["person", "location"])

# LLM (API-based)
builder.query_parser(method="llm", labels=["person", "location"], api_key="sk-...")

# Tool-calling (LLM decides what entities to search for)
builder.query_parser(method="tool", api_key="sk-...", model="gpt-4o-mini")

Adding a Reasoner

Any retrieval strategy can be paired with an LLM reasoner that generates a natural language answer from the retrieved subgraph:

Builder API:

builder.reasoner(
api_key="sk-...",
model="gpt-4o-mini",
)

YAML:

- id: reasoner
processor: reasoner
requires: [chunks]
inputs:
query: {source: "$input", fields: "query"}
subgraph: {source: "chunk_result", fields: "subgraph"}
output: {key: "reasoner_result"}
config:
api_key: "sk-..."
model: "gpt-4o-mini"

Parameters:

ParameterDefaultDescription
api_key(required)OpenAI-compatible API key
model"gpt-4o-mini"LLM model name
temperature0.1Sampling temperature
base_urlNoneCustom API endpoint

Without a reasoner, you still get the full retrieved subgraph:

result = executor.run(query="Where was Einstein born?")
subgraph = result.get("chunk_result")["subgraph"]
print(subgraph.entities)
print(subgraph.relations)
print(subgraph.chunks)