Databases
RetriCo supports three categories of databases: graph, vector, and relational. Each category has multiple backends that you can mix and match.
Graph Databases
Graph databases store the knowledge graph — entities, relations, chunks, and documents.
FalkorDB Lite (Default)
Embedded, zero-configuration graph database. No server needed — data is stored locally.
One-liner:
import retrico
# Automatic — FalkorDB Lite is the default
result = retrico.build_graph(texts=[...], entity_labels=[...])
# Explicit
result = retrico.build_graph(
texts=[...],
entity_labels=[...],
store_config=retrico.FalkorDBLiteConfig(),
)
Builder API:
builder = retrico.RetriCoBuilder(name="my_pipeline")
builder.graph_store(retrico.FalkorDBLiteConfig())
YAML:
stores:
graph:
store_type: falkordb_lite
Neo4j
Production-grade graph database with a rich query language (Cypher), built-in visualization, and enterprise features.
# Start Neo4j (Docker)
docker run -d --name neo4j -p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password neo4j:latest
One-liner:
config = retrico.Neo4jConfig(
uri="bolt://localhost:7687",
user="neo4j",
password="password",
database="neo4j", # optional
)
result = retrico.build_graph(
texts=[...],
entity_labels=[...],
store_config=config,
)
Builder API:
builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687", password="password"))
YAML:
stores:
graph:
store_type: neo4j
uri: "bolt://localhost:7687"
user: neo4j
password: password
database: neo4j
Neo4j Config Parameters:
| Parameter | Default | Description |
|---|---|---|
uri | "bolt://localhost:7687" | Bolt protocol URI |
user | "neo4j" | Username |
password | "password" | Password |
database | "neo4j" | Database name |
Direct Queries
You can also query Neo4j directly:
store = retrico.Neo4jGraphStore(uri="bolt://localhost:7687", password="password")
entity = store.get_entity_by_label("Einstein")
relations = store.get_entity_relations(entity["id"])
neighbors = store.get_entity_neighbors(entity["id"], max_hops=2)
chunks = store.get_chunks_for_entity(entity["id"])
subgraph = store.get_subgraph(entity_ids=[entity["id"]], max_hops=1)
store.close()
FalkorDB (Server)
Redis-compatible graph database with Cypher support. Faster than Neo4j for many workloads.
# Start FalkorDB (Docker)
docker run -d --name falkordb -p 6379:6379 falkordb/falkordb:latest
One-liner:
config = retrico.FalkorDBConfig(
host="localhost",
port=6379,
graph="my_graph",
)
result = retrico.build_graph(texts=[...], entity_labels=[...], store_config=config)
Builder API:
builder.graph_store(retrico.FalkorDBConfig(host="localhost", port=6379, graph="my_graph"))
YAML:
stores:
graph:
store_type: falkordb
host: localhost
port: 6379
graph: my_graph
FalkorDB Config Parameters:
| Parameter | Default | Description |
|---|---|---|
host | "localhost" | Redis host |
port | 6379 | Redis port |
graph | "retrico" | Graph name |
Memgraph
High-performance, in-memory graph database compatible with the Bolt protocol.
# Start Memgraph (Docker)
docker run -d --name memgraph -p 7687:7687 memgraph/memgraph:latest
One-liner:
config = retrico.MemgraphConfig(
uri="bolt://localhost:7687",
user="",
password="",
database="memgraph",
)
result = retrico.build_graph(texts=[...], entity_labels=[...], store_config=config)
Builder API:
builder.graph_store(retrico.MemgraphConfig(uri="bolt://localhost:7687"))
YAML:
stores:
graph:
store_type: memgraph
uri: "bolt://localhost:7687"
database: memgraph
Memgraph uses the same Neo4j Python driver (Bolt protocol), so no additional dependencies are needed.
Vector Databases
Vector stores hold embeddings for semantic search — used by chunk, entity, and community retrieval strategies.
In-Memory (Default)
Simple, zero-config vector store. Good for development and small datasets.
Builder API:
builder.vector_store(type="in_memory")
YAML:
stores:
vector:
store_type: in_memory
FAISS
Facebook's high-performance vector similarity search. Supports GPU acceleration.
pip install faiss-cpu # or faiss-gpu
Builder API:
builder.vector_store(retrico.FaissVectorConfig(use_gpu=False))
YAML:
stores:
vector:
store_type: faiss
use_gpu: false
Parameters:
| Parameter | Default | Description |
|---|---|---|
use_gpu | False | Enable GPU acceleration |
Qdrant
Production-ready vector database with filtering and payload support.
pip install qdrant-client
Builder API:
builder.vector_store(retrico.QdrantVectorConfig(
url="http://localhost:6333",
collection_name="my_embeddings",
))
YAML:
stores:
vector:
store_type: qdrant
url: "http://localhost:6333"
collection_name: my_embeddings
Parameters:
| Parameter | Default | Description |
|---|---|---|
url | "http://localhost:6333" | Qdrant server URL |
collection_name | "retrico" | Collection name |
Graph DB-Backed
Store embeddings directly in the graph database nodes. Useful when you want a single storage backend.
Builder API:
builder.vector_store(retrico.GraphDBVectorConfig())
YAML:
stores:
vector:
store_type: graph_db
Relational Databases
Relational stores hold chunks and documents for full-text search and structured queries.
SQLite
Zero-config local storage with FTS5 full-text search. Built into Python — no external dependencies.
Builder API:
builder.chunk_store(type="sqlite", path="chunks.db")
YAML:
stores:
relational:
store_type: sqlite
path: chunks.db
PostgreSQL
Production use with tsvector for full-text search and optional pgvector for embeddings.
Builder API:
builder.chunk_store(type="postgres", host="localhost", dbname="retrico")
YAML:
stores:
relational:
store_type: postgres
host: localhost
dbname: retrico
Elasticsearch
Full-text search with advanced relevance scoring.
Builder API:
builder.chunk_store(type="elasticsearch", url="http://localhost:9200")
YAML:
stores:
relational:
store_type: elasticsearch
url: "http://localhost:9200"
Store Pool
The store pool manages shared, named database connections across the pipeline. Configure stores once at the builder level, and all components inherit them automatically. Connections are created lazily (on first access) and shared — calling the same named store from multiple processors returns the same instance.
Basic Usage
builder = retrico.RetriCoBuilder(name="my_pipeline")
# Register named stores (shared across all processors)
builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687", password="pass"), name="main")
builder.vector_store(retrico.FaissVectorConfig(use_gpu=True), name="embeddings")
builder.chunk_store(retrico.SqliteRelationalConfig(path="chunks.db"), name="chunks")
# Pipeline nodes — no need to repeat connection details
builder.chunker(method="sentence")
builder.ner_gliner(labels=["person", "org"])
builder.graph_writer() # uses "main" graph store + "chunks" relational store
builder.chunk_embedder() # uses "main" graph store + "embeddings" vector store
# Context manager auto-closes all connections
with builder.build() as executor:
result = executor.run(texts=[...])
Multiple Stores of the Same Type
You can register multiple stores of the same category (e.g. two graph databases) and reference them by name from individual processors using the graph_store_name config parameter:
builder = retrico.RetriCoBuilder(name="multi_graph")
# Two graph databases
builder.graph_store(retrico.Neo4jConfig(uri="bolt://prod:7687", password="pass"), name="production")
builder.graph_store(retrico.FalkorDBConfig(host="localhost", port=6379), name="staging")
builder.chunker(method="sentence")
builder.ner_gliner(labels=["person", "location"])
# Write to staging
builder.graph_writer(graph_store_name="staging")
with builder.build() as executor:
result = executor.run(texts=[...])
The graph_store_name parameter tells a processor which named store to use. Without it, the first (or only) registered store is used. The same pattern works for vector stores (vector_store_name) and relational stores (relational_store_name).
Vector stores can also reference a named graph store — useful when storing embeddings directly in graph database nodes:
builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687"), name="main")
builder.vector_store(retrico.GraphDBVectorConfig(graph_store_name="main"))
YAML Config
In YAML, the stores section uses a dict of named stores per category:
name: multi_store_pipeline
stores:
graph:
production:
store_type: neo4j
neo4j_uri: bolt://prod:7687
neo4j_password: password
staging:
store_type: falkordb
falkordb_host: localhost
falkordb_port: 6379
vector:
default:
vector_store_type: faiss
use_gpu: true
nodes:
- id: writer
processor: graph_writer
config:
graph_store_name: staging
- id: embedder
processor: chunk_embedder
config:
graph_store_name: production
vector_store_name: default
The ProcessorFactory detects the stores key, creates a shared pool, and injects it into all processor configs. Processors without an explicit graph_store_name use the first registered store.
StorePool Directly
For advanced use cases, create and manage a StorePool directly:
from retrico import StorePool
pool = StorePool()
pool.register_graph("main", {"store_type": "neo4j", "neo4j_uri": "bolt://localhost:7687"})
pool.register_graph("backup", {"store_type": "falkordb", "falkordb_host": "localhost"})
pool.register_vector("embeddings", {"vector_store_type": "faiss", "use_gpu": True})
# Lazy creation — connection is opened on first access
store = pool.get_graph("main") # creates Neo4j connection
store2 = pool.get_graph("main") # returns the same instance
backup = pool.get_graph("backup") # creates FalkorDB connection
pool.close() # closes all instantiated connections
Backward Compatibility
All existing configs and calling patterns continue to work unchanged:
- Configs without a
storessection — processors fall back to creating their own connections from flat parameters (e.g.neo4j_uri,store_type) - A single unnamed store — equivalent to registering it with the name
"default"
Direct Store Queries
All graph stores support read queries outside of a pipeline:
from retrico import Neo4jGraphStore, FalkorDBGraphStore, MemgraphGraphStore
store = Neo4jGraphStore(uri="bolt://localhost:7687", password="password")
entity = store.get_entity_by_label("Albert Einstein")
entity = store.get_entity_by_id("Q937") # by ID (useful for linked entities)
relations = store.get_entity_relations(entity["id"])
neighbors = store.get_entity_neighbors(entity["id"], max_hops=2)
chunks = store.get_chunks_for_entity(entity["id"])
subgraph = store.get_subgraph(entity_ids=[entity["id"]], max_hops=1)
all_entities = store.get_all_entities()
store.close()
Creating Stores Programmatically
Use factory functions to create stores from flat dicts or config objects:
from retrico import create_graph_store, create_vector_store, create_relational_store
graph = create_graph_store({"store_type": "neo4j", "neo4j_uri": "bolt://localhost:7687"})
vector = create_vector_store({"vector_store_type": "faiss", "use_gpu": True})
relational = create_relational_store({"relational_store_type": "sqlite", "sqlite_path": "chunks.db"})
Graph Mutations
All graph stores support CRUD operations for surgical changes without raw Cypher:
store = retrico.Neo4jGraphStore(uri="bolt://localhost:7687", password="password")
# Add entities
einstein_id = store.add_entity("Albert Einstein", "person", properties={"birth_year": 1879})
ulm_id = store.add_entity("Ulm", "location")
# Add a relation (validates both entities exist)
rel_id = store.add_relation(einstein_id, ulm_id, "born in")
# Add a relation with temporal properties
rel_id = store.add_relation(einstein_id, ulm_id, "lived in",
start_date="1879-03-14", end_date="1880-06-01")
# Update an entity — only provided fields change, properties are merged
store.update_entity(einstein_id, properties={"death_year": 1955})
# Delete
store.delete_relation(rel_id)
store.delete_entity(ulm_id)
store.delete_chunk("chunk-123")
# Merge two entities — moves all relationships to target, deletes source
store.merge_entities(source_id="e-duplicate", target_id="e-canonical")
store.close()
Mutation methods:
| Method | Signature | Returns |
|---|---|---|
add_entity | (label, entity_type, *, properties, id) | str (UUID) |
add_relation | (head_id, tail_id, relation_type, *, properties, id, start_date, end_date) | str (UUID) |
update_entity | (entity_id, *, label, entity_type, properties) | bool |
delete_entity | (entity_id) | bool |
delete_relation | (relation_id) | bool |
delete_chunk | (chunk_id) | bool |
merge_entities | (source_id, target_id) | bool |
Relation Properties and Temporal Filtering
Relations support two first-class temporal fields — start_date and end_date — plus an arbitrary properties dict. These are stored directly on the relation edge in the graph database.
Adding temporal properties
# Via data ingest
retrico.ingest_data(data=[{
"entities": [
{"text": "Einstein", "label": "person"},
{"text": "ETH Zurich", "label": "organization"},
],
"relations": [{
"head": "Einstein", "tail": "ETH Zurich", "type": "worked_at",
"start_date": "1912-01-01", "end_date": "1914-03-01",
"properties": {"role": "professor"},
}],
}])
# Via graph store API
rel_id = store.add_relation(
einstein_id, eth_id, "worked_at",
start_date="1912-01-01", end_date="1914-03-01",
properties={"role": "professor"},
)
Dates should be ISO 8601 strings (e.g. "1879-03-14", "2024-01"). Use None/null when unknown.
Temporal filtering at query time
Retrievers can filter relations by date range using active_after and active_before:
builder = retrico.RetriCoSearch(name="temporal_query")
builder.query_parser(method="gliner", labels=["person", "organization"])
builder.retriever(
max_hops=2,
active_after="2020-01-01",
active_before="2020-12-31",
)
builder.reasoner(api_key="...", model="gpt-4o-mini")
# YAML config equivalent
- id: retriever
processor: retriever
config:
max_hops: 2
active_after: "2020-01-01"
active_before: "2020-12-31"
The filtering logic:
active_after: keeps relations whereend_date IS NULL OR end_date >= active_afteractive_before: keeps relations wherestart_date IS NULL OR start_date <= active_before- Relations without dates are always included (treated as always active)
The tool_retriever also supports temporal filtering — the LLM can pass start_date/end_date to tools like get_entity_relations dynamically.
Custom Stores
Register your own graph, vector, or relational store backends. Once registered, they work everywhere — builders, YAML configs, convenience functions, and the store pool.
Custom Graph Store
Implement BaseGraphStore and register it:
from retrico.store.graph.base import BaseGraphStore
class TigerGraphStore(BaseGraphStore):
def __init__(self, host="localhost", port=9000, graph="MyGraph", token=None):
self._host = host
self._port = port
self._graph = graph
self._token = token
self._conn = None
def setup_indexes(self): ...
def close(self): ...
def write_entity(self, entity): ...
def write_relation(self, relation, head_entity_id, tail_entity_id): ...
def get_entity_by_label(self, label): ...
def get_entity_by_id(self, entity_id): ...
def get_entity_neighbors(self, entity_id, max_hops=1): ...
def get_entity_relations(self, entity_id): ...
def get_chunks_for_entity(self, entity_id): ...
def get_subgraph(self, entity_ids, max_hops=1): ...
# ... implement remaining abstract methods
Register it:
import retrico
def create_tigergraph(config):
from my_store import TigerGraphStore
return TigerGraphStore(
host=config.get("tigergraph_host", "localhost"),
port=config.get("tigergraph_port", 9000),
graph=config.get("tigergraph_graph", "MyGraph"),
token=config.get("tigergraph_token"),
)
retrico.register_graph_store("tigergraph", create_tigergraph)
Now store_type="tigergraph" works across all APIs:
result = retrico.build_graph(
texts=[...], entity_labels=[...],
store_type="tigergraph",
tigergraph_host="localhost",
)
Custom Vector Store
from retrico.store.vector.base import BaseVectorStore
class PineconeVectorStore(BaseVectorStore):
def create_index(self, name, dimension): ...
def store_embeddings(self, index_name, items): ...
def search_similar(self, index_name, query_vector, top_k=10): ...
retrico.register_vector_store("pinecone", lambda config: PineconeVectorStore(...))
Custom Processor
Register custom pipeline processors using category-specific registries:
import retrico
from retrico.core.base import BaseProcessor
class SpacyNERProcessor(BaseProcessor):
def __call__(self, chunks, **kwargs):
# Your NER logic here
return {"entities": all_entities, "chunks": chunks}
retrico.register_construct_processor("ner_spacy", lambda config, pipeline=None: SpacyNERProcessor(config, pipeline))
Once registered, use it in builders and YAML:
builder.add_node(
id="ner", processor="ner_spacy",
config={"model": "en_core_web_sm", "labels": ["person", "org"]},
inputs={"chunks": "chunker_result.chunks"},
output="ner_result",
)
Available registries:
| Registry | Category | Convenience function |
|---|---|---|
construct_registry | Build pipeline (chunker, NER, relex, ...) | retrico.register_construct_processor() |
query_registry | Query pipeline (parser, retrievers, ...) | retrico.register_query_processor() |
modeling_registry | KG modeling (community, KG training, ...) | retrico.register_modeling_processor() |