Modeling
RetriCo provides two modeling capabilities that enrich your knowledge graph: community detection and knowledge graph embeddings.
Community Detection
Discover clusters of related entities in your graph using Louvain or Leiden algorithms. Optionally generate LLM summaries for each community and embed them for vector search.
One-liner
import retrico
result = retrico.detect_communities(
method="louvain", # "louvain" or "leiden"
levels=1, # hierarchical levels
resolution=1.0, # resolution parameter
api_key="sk-...", # enables LLM summaries + embeddings
model="gpt-4o-mini",
)
When an api_key is provided, the pipeline:
- Detects communities using the chosen algorithm
- Summarizes each community using an LLM (based on its top entities and relations)
- Embeds the summaries into a vector store for retrieval
Without an api_key, only detection is performed.
Builder API
builder = retrico.RetriCoCommunity(name="my_communities")
builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687"))
builder.detector(
method="louvain",
levels=1,
resolution=1.0,
)
builder.summarizer(
api_key="sk-...",
model="gpt-4o-mini",
top_k=10,
)
builder.embedder(
embedding_method="sentence_transformer",
model_name="all-MiniLM-L6-v2",
vector_store_type="faiss",
)
executor = builder.build(verbose=True)
result = executor.run()
YAML Config
name: community_detection
stores:
graph:
store_type: neo4j
uri: "bolt://localhost:7687"
vector:
store_type: faiss
nodes:
- id: detector
processor: community_detector
output: {key: "detector_result"}
config:
method: louvain
levels: 1
resolution: 1.0
- id: summarizer
processor: community_summarizer
requires: [detector]
inputs:
communities: {source: "detector_result", fields: "communities"}
output: {key: "summarizer_result"}
config:
api_key: "sk-..."
model: "gpt-4o-mini"
top_k: 10
- id: embedder
processor: community_embedder
requires: [summarizer]
inputs:
communities: {source: "summarizer_result", fields: "communities"}
output: {key: "embedder_result"}
config:
embedding_method: sentence_transformer
model_name: "all-MiniLM-L6-v2"
vector_store_type: faiss
Parameters
Detector:
| Parameter | Default | Description |
|---|---|---|
method | "louvain" | Algorithm: "louvain" or "leiden" |
levels | 1 | Number of hierarchical levels |
resolution | 1.0 | Resolution parameter (higher = more communities) |
Summarizer:
| Parameter | Default | Description |
|---|---|---|
api_key | (required) | OpenAI-compatible API key |
model | "gpt-4o-mini" | LLM model name |
top_k | 10 | Max entities per community for summarization context |
temperature | 0.1 | LLM sampling temperature |
Embedder:
| Parameter | Default | Description |
|---|---|---|
embedding_method | "sentence_transformer" | "sentence_transformer" or "openai" |
model_name | "all-MiniLM-L6-v2" | Embedding model name |
vector_store_type | "in_memory" | "in_memory", "faiss", or "qdrant" |
Using Communities for Retrieval
Once communities are built, use the community_retriever to search over them:
result = retrico.query_graph(
query="What research fields are represented in the graph?",
retrieval_strategy="community",
api_key="sk-...",
)
See Retrieving - Community Search for details.
Knowledge Graph Embeddings
Train entity and relation embeddings using PyKEEN. These embeddings capture the structural patterns of your knowledge graph and enable vector-based entity retrieval and link prediction.
Installation
pip install pykeen
One-liner
result = retrico.train_kg_model(
model="RotatE", # PyKEEN model
embedding_dim=128,
epochs=100,
batch_size=256,
lr=0.001,
device="cpu",
model_path="kg_model",
vector_store_type="faiss",
store_to_graph=False,
)
Builder API
builder = retrico.RetriCoModeling(name="train_embeddings")
builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687"))
builder.triple_reader(
source="graph_store",
train_ratio=0.8,
val_ratio=0.1,
test_ratio=0.1,
)
builder.trainer(
model="RotatE",
embedding_dim=128,
epochs=100,
batch_size=256,
lr=0.001,
device="cpu",
)
builder.storer(
model_path="kg_model",
vector_store_type="faiss",
store_to_graph=False,
)
executor = builder.build(verbose=True)
result = executor.run()
YAML Config
name: kg_embeddings
stores:
graph:
store_type: neo4j
uri: "bolt://localhost:7687"
vector:
store_type: faiss
nodes:
- id: reader
processor: kg_triple_reader
output: {key: "reader_result"}
config:
source: graph_store
train_ratio: 0.8
val_ratio: 0.1
test_ratio: 0.1
- id: trainer
processor: kg_trainer
requires: [reader]
inputs:
triples: {source: "reader_result", fields: "triples"}
output: {key: "trainer_result"}
config:
model: RotatE
embedding_dim: 128
epochs: 100
batch_size: 256
lr: 0.001
device: cpu
- id: storer
processor: kg_embedding_storer
requires: [trainer]
inputs:
model: {source: "trainer_result", fields: "model"}
entity_embeddings: {source: "trainer_result", fields: "entity_embeddings"}
output: {key: "storer_result"}
config:
model_path: kg_model
vector_store_type: faiss
store_to_graph: false
Supported Models
RetriCo uses PyKEEN, which supports 40+ KG embedding models:
| Model | Type | Description |
|---|---|---|
RotatE | Rotation | Models relations as rotations in complex space |
TransE | Translation | Relations as translations in embedding space |
ComplEx | Factorization | Complex-valued tensor factorization |
DistMult | Factorization | Diagonal bilinear model |
TuckER | Factorization | Tucker decomposition of the binary tensor |
See the PyKEEN model catalog for the full list.
Parameters
Triple Reader:
| Parameter | Default | Description |
|---|---|---|
source | "graph_store" | "graph_store" (read from DB) or "tsv" (read from file) |
tsv_path | None | Path to TSV file (head, relation, tail) |
train_ratio | 0.8 | Training data split ratio |
val_ratio | 0.1 | Validation data split ratio |
test_ratio | 0.1 | Test data split ratio |
Trainer:
| Parameter | Default | Description |
|---|---|---|
model | "RotatE" | PyKEEN model name |
embedding_dim | 128 | Dimension of embeddings |
epochs | 100 | Training epochs |
batch_size | 256 | Training batch size |
lr | 0.001 | Learning rate |
device | "cpu" | "cpu" or "cuda" |
Storer:
| Parameter | Default | Description |
|---|---|---|
model_path | (required) | Directory to save model weights |
vector_store_type | "in_memory" | Where to store embeddings |
store_to_graph | False | Write embeddings as node properties in graph DB |
Using KG Embeddings for Retrieval
Once trained, use entity embeddings for retrieval:
result = retrico.query_graph(
query="Who works at similar institutions to Einstein?",
entity_labels=["person", "organization"],
retrieval_strategy="entity_embedding",
retriever_kwargs={"top_k": 10, "vector_index_name": "entity_embeddings"},
)
See Retrieving - Entity Embeddings for details.
Query-Time Link Prediction
Add a kg_scorer node to any query pipeline to score existing triples and predict missing links using trained KG embeddings:
from retrico import RetriCoSearch
builder = RetriCoSearch(name="scored_query")
builder.query_parser(labels=["person", "location"])
builder.retriever(max_hops=2)
builder.chunk_retriever()
# Add KG scoring — loads trained model from disk
builder.kg_scorer(
model_path="kg_model",
top_k=10,
predict_tails=True,
predict_heads=False,
score_threshold=None,
device="cpu",
)
builder.reasoner(api_key="sk-...", model="gpt-4o-mini")
executor = builder.build()
ctx = executor.run({"query": "Where was Einstein born?"})
# Access scoring results
scorer_result = ctx.get("kg_scorer_result")
print(scorer_result["scored_triples"]) # existing triples with KGE scores
print(scorer_result["predictions"]) # predicted missing links
# scorer_result["subgraph"] is enriched with predicted relations
The KG scorer can also act as a universal retriever — see Retrieving - KG-Scored Retrieval for the kg_scored strategy that combines tool-calling parsing with KG scoring.
How it works
- Scores existing triples in the retrieved subgraph using
model.score_hrt() - Predicts missing links for query entities (top-k tail/head predictions)
- Predictions are added to the subgraph as additional relations
- In
kg_scoredmode, the scorer resolvestriple_queriesfrom the tool-calling parser, building a scored subgraph without needing a separate retriever
Parameters:
| Parameter | Default | Description |
|---|---|---|
model_path | (required) | Directory with saved model weights and mappings |
top_k | 10 | Top predictions per entity |
predict_tails | True | Predict (entity, relation, ?) |
predict_heads | False | Predict (?, relation, entity) |
score_threshold | None | Minimum score filter |
device | "cpu" | "cpu" or "cuda" |