Skip to main content

Modeling

RetriCo provides two modeling capabilities that enrich your knowledge graph: community detection and knowledge graph embeddings.

Community Detection

Discover clusters of related entities in your graph using Louvain or Leiden algorithms. Optionally generate LLM summaries for each community and embed them for vector search.

One-liner

import retrico

result = retrico.detect_communities(
method="louvain", # "louvain" or "leiden"
levels=1, # hierarchical levels
resolution=1.0, # resolution parameter
api_key="sk-...", # enables LLM summaries + embeddings
model="gpt-4o-mini",
)

When an api_key is provided, the pipeline:

  1. Detects communities using the chosen algorithm
  2. Summarizes each community using an LLM (based on its top entities and relations)
  3. Embeds the summaries into a vector store for retrieval

Without an api_key, only detection is performed.

Builder API

builder = retrico.RetriCoCommunity(name="my_communities")
builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687"))
builder.detector(
method="louvain",
levels=1,
resolution=1.0,
)
builder.summarizer(
api_key="sk-...",
model="gpt-4o-mini",
top_k=10,
)
builder.embedder(
embedding_method="sentence_transformer",
model_name="all-MiniLM-L6-v2",
vector_store_type="faiss",
)
executor = builder.build(verbose=True)
result = executor.run()

YAML Config

name: community_detection
stores:
graph:
store_type: neo4j
uri: "bolt://localhost:7687"
vector:
store_type: faiss

nodes:
- id: detector
processor: community_detector
output: {key: "detector_result"}
config:
method: louvain
levels: 1
resolution: 1.0

- id: summarizer
processor: community_summarizer
requires: [detector]
inputs:
communities: {source: "detector_result", fields: "communities"}
output: {key: "summarizer_result"}
config:
api_key: "sk-..."
model: "gpt-4o-mini"
top_k: 10

- id: embedder
processor: community_embedder
requires: [summarizer]
inputs:
communities: {source: "summarizer_result", fields: "communities"}
output: {key: "embedder_result"}
config:
embedding_method: sentence_transformer
model_name: "all-MiniLM-L6-v2"
vector_store_type: faiss

Parameters

Detector:

ParameterDefaultDescription
method"louvain"Algorithm: "louvain" or "leiden"
levels1Number of hierarchical levels
resolution1.0Resolution parameter (higher = more communities)

Summarizer:

ParameterDefaultDescription
api_key(required)OpenAI-compatible API key
model"gpt-4o-mini"LLM model name
top_k10Max entities per community for summarization context
temperature0.1LLM sampling temperature

Embedder:

ParameterDefaultDescription
embedding_method"sentence_transformer""sentence_transformer" or "openai"
model_name"all-MiniLM-L6-v2"Embedding model name
vector_store_type"in_memory""in_memory", "faiss", or "qdrant"

Using Communities for Retrieval

Once communities are built, use the community_retriever to search over them:

result = retrico.query_graph(
query="What research fields are represented in the graph?",
retrieval_strategy="community",
api_key="sk-...",
)

See Retrieving - Community Search for details.


Knowledge Graph Embeddings

Train entity and relation embeddings using PyKEEN. These embeddings capture the structural patterns of your knowledge graph and enable vector-based entity retrieval and link prediction.

Installation

pip install pykeen

One-liner

result = retrico.train_kg_model(
model="RotatE", # PyKEEN model
embedding_dim=128,
epochs=100,
batch_size=256,
lr=0.001,
device="cpu",
model_path="kg_model",
vector_store_type="faiss",
store_to_graph=False,
)

Builder API

builder = retrico.RetriCoModeling(name="train_embeddings")
builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687"))
builder.triple_reader(
source="graph_store",
train_ratio=0.8,
val_ratio=0.1,
test_ratio=0.1,
)
builder.trainer(
model="RotatE",
embedding_dim=128,
epochs=100,
batch_size=256,
lr=0.001,
device="cpu",
)
builder.storer(
model_path="kg_model",
vector_store_type="faiss",
store_to_graph=False,
)
executor = builder.build(verbose=True)
result = executor.run()

YAML Config

name: kg_embeddings
stores:
graph:
store_type: neo4j
uri: "bolt://localhost:7687"
vector:
store_type: faiss

nodes:
- id: reader
processor: kg_triple_reader
output: {key: "reader_result"}
config:
source: graph_store
train_ratio: 0.8
val_ratio: 0.1
test_ratio: 0.1

- id: trainer
processor: kg_trainer
requires: [reader]
inputs:
triples: {source: "reader_result", fields: "triples"}
output: {key: "trainer_result"}
config:
model: RotatE
embedding_dim: 128
epochs: 100
batch_size: 256
lr: 0.001
device: cpu

- id: storer
processor: kg_embedding_storer
requires: [trainer]
inputs:
model: {source: "trainer_result", fields: "model"}
entity_embeddings: {source: "trainer_result", fields: "entity_embeddings"}
output: {key: "storer_result"}
config:
model_path: kg_model
vector_store_type: faiss
store_to_graph: false

Supported Models

RetriCo uses PyKEEN, which supports 40+ KG embedding models:

ModelTypeDescription
RotatERotationModels relations as rotations in complex space
TransETranslationRelations as translations in embedding space
ComplExFactorizationComplex-valued tensor factorization
DistMultFactorizationDiagonal bilinear model
TuckERFactorizationTucker decomposition of the binary tensor

See the PyKEEN model catalog for the full list.

Parameters

Triple Reader:

ParameterDefaultDescription
source"graph_store""graph_store" (read from DB) or "tsv" (read from file)
tsv_pathNonePath to TSV file (head, relation, tail)
train_ratio0.8Training data split ratio
val_ratio0.1Validation data split ratio
test_ratio0.1Test data split ratio

Trainer:

ParameterDefaultDescription
model"RotatE"PyKEEN model name
embedding_dim128Dimension of embeddings
epochs100Training epochs
batch_size256Training batch size
lr0.001Learning rate
device"cpu""cpu" or "cuda"

Storer:

ParameterDefaultDescription
model_path(required)Directory to save model weights
vector_store_type"in_memory"Where to store embeddings
store_to_graphFalseWrite embeddings as node properties in graph DB

Using KG Embeddings for Retrieval

Once trained, use entity embeddings for retrieval:

result = retrico.query_graph(
query="Who works at similar institutions to Einstein?",
entity_labels=["person", "organization"],
retrieval_strategy="entity_embedding",
retriever_kwargs={"top_k": 10, "vector_index_name": "entity_embeddings"},
)

See Retrieving - Entity Embeddings for details.


Add a kg_scorer node to any query pipeline to score existing triples and predict missing links using trained KG embeddings:

from retrico import RetriCoSearch

builder = RetriCoSearch(name="scored_query")
builder.query_parser(labels=["person", "location"])
builder.retriever(max_hops=2)
builder.chunk_retriever()

# Add KG scoring — loads trained model from disk
builder.kg_scorer(
model_path="kg_model",
top_k=10,
predict_tails=True,
predict_heads=False,
score_threshold=None,
device="cpu",
)

builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
executor = builder.build()
ctx = executor.run({"query": "Where was Einstein born?"})

# Access scoring results
scorer_result = ctx.get("kg_scorer_result")
print(scorer_result["scored_triples"]) # existing triples with KGE scores
print(scorer_result["predictions"]) # predicted missing links
# scorer_result["subgraph"] is enriched with predicted relations

The KG scorer can also act as a universal retriever — see Retrieving - KG-Scored Retrieval for the kg_scored strategy that combines tool-calling parsing with KG scoring.

How it works

  • Scores existing triples in the retrieved subgraph using model.score_hrt()
  • Predicts missing links for query entities (top-k tail/head predictions)
  • Predictions are added to the subgraph as additional relations
  • In kg_scored mode, the scorer resolves triple_queries from the tool-calling parser, building a scored subgraph without needing a separate retriever

Parameters:

ParameterDefaultDescription
model_path(required)Directory with saved model weights and mappings
top_k10Top predictions per entity
predict_tailsTruePredict (entity, relation, ?)
predict_headsFalsePredict (?, relation, entity)
score_thresholdNoneMinimum score filter
device"cpu""cpu" or "cuda"