Skip to main content

Databases

RetriCo supports three categories of databases: graph, vector, and relational. Each category has multiple backends that you can mix and match.

Graph Databases

Graph databases store the knowledge graph — entities, relations, chunks, and documents.

FalkorDB Lite (Default)

Embedded, zero-configuration graph database. No server needed — data is stored locally.

One-liner:

import retrico

# Automatic — FalkorDB Lite is the default
result = retrico.build_graph(texts=[...], entity_labels=[...])

# Explicit
result = retrico.build_graph(
texts=[...],
entity_labels=[...],
store_config=retrico.FalkorDBLiteConfig(),
)

Builder API:

builder = retrico.RetriCoBuilder(name="my_pipeline")
builder.graph_store(retrico.FalkorDBLiteConfig())

YAML:

stores:
graph:
store_type: falkordb_lite

Neo4j

Production-grade graph database with a rich query language (Cypher), built-in visualization, and enterprise features.

# Start Neo4j (Docker)
docker run -d --name neo4j -p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password neo4j:latest

One-liner:

config = retrico.Neo4jConfig(
uri="bolt://localhost:7687",
user="neo4j",
password="password",
database="neo4j", # optional
)

result = retrico.build_graph(
texts=[...],
entity_labels=[...],
store_config=config,
)

Builder API:

builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687", password="password"))

YAML:

stores:
graph:
store_type: neo4j
uri: "bolt://localhost:7687"
user: neo4j
password: password
database: neo4j

Neo4j Config Parameters:

ParameterDefaultDescription
uri"bolt://localhost:7687"Bolt protocol URI
user"neo4j"Username
password"password"Password
database"neo4j"Database name

Direct Queries

You can also query Neo4j directly:

store = retrico.Neo4jGraphStore(uri="bolt://localhost:7687", password="password")

entity = store.get_entity_by_label("Einstein")
relations = store.get_entity_relations(entity["id"])
neighbors = store.get_entity_neighbors(entity["id"], max_hops=2)
chunks = store.get_chunks_for_entity(entity["id"])
subgraph = store.get_subgraph(entity_ids=[entity["id"]], max_hops=1)

store.close()

FalkorDB (Server)

Redis-compatible graph database with Cypher support. Faster than Neo4j for many workloads.

# Start FalkorDB (Docker)
docker run -d --name falkordb -p 6379:6379 falkordb/falkordb:latest

One-liner:

config = retrico.FalkorDBConfig(
host="localhost",
port=6379,
graph="my_graph",
)

result = retrico.build_graph(texts=[...], entity_labels=[...], store_config=config)

Builder API:

builder.graph_store(retrico.FalkorDBConfig(host="localhost", port=6379, graph="my_graph"))

YAML:

stores:
graph:
store_type: falkordb
host: localhost
port: 6379
graph: my_graph

FalkorDB Config Parameters:

ParameterDefaultDescription
host"localhost"Redis host
port6379Redis port
graph"retrico"Graph name

Memgraph

High-performance, in-memory graph database compatible with the Bolt protocol.

# Start Memgraph (Docker)
docker run -d --name memgraph -p 7687:7687 memgraph/memgraph:latest

One-liner:

config = retrico.MemgraphConfig(
uri="bolt://localhost:7687",
user="",
password="",
database="memgraph",
)

result = retrico.build_graph(texts=[...], entity_labels=[...], store_config=config)

Builder API:

builder.graph_store(retrico.MemgraphConfig(uri="bolt://localhost:7687"))

YAML:

stores:
graph:
store_type: memgraph
uri: "bolt://localhost:7687"
database: memgraph

Memgraph uses the same Neo4j Python driver (Bolt protocol), so no additional dependencies are needed.


Vector Databases

Vector stores hold embeddings for semantic search — used by chunk, entity, and community retrieval strategies.

In-Memory (Default)

Simple, zero-config vector store. Good for development and small datasets.

Builder API:

builder.vector_store(type="in_memory")

YAML:

stores:
vector:
store_type: in_memory

FAISS

Facebook's high-performance vector similarity search. Supports GPU acceleration.

pip install faiss-cpu  # or faiss-gpu

Builder API:

builder.vector_store(retrico.FaissVectorConfig(use_gpu=False))

YAML:

stores:
vector:
store_type: faiss
use_gpu: false

Parameters:

ParameterDefaultDescription
use_gpuFalseEnable GPU acceleration

Qdrant

Production-ready vector database with filtering and payload support.

pip install qdrant-client

Builder API:

builder.vector_store(retrico.QdrantVectorConfig(
url="http://localhost:6333",
collection_name="my_embeddings",
))

YAML:

stores:
vector:
store_type: qdrant
url: "http://localhost:6333"
collection_name: my_embeddings

Parameters:

ParameterDefaultDescription
url"http://localhost:6333"Qdrant server URL
collection_name"retrico"Collection name

Graph DB-Backed

Store embeddings directly in the graph database nodes. Useful when you want a single storage backend.

Builder API:

builder.vector_store(retrico.GraphDBVectorConfig())

YAML:

stores:
vector:
store_type: graph_db

Relational Databases

Relational stores hold chunks and documents for full-text search and structured queries.

SQLite

Zero-config local storage with FTS5 full-text search. Built into Python — no external dependencies.

Builder API:

builder.chunk_store(type="sqlite", path="chunks.db")

YAML:

stores:
relational:
store_type: sqlite
path: chunks.db

PostgreSQL

Production use with tsvector for full-text search and optional pgvector for embeddings.

Builder API:

builder.chunk_store(type="postgres", host="localhost", dbname="retrico")

YAML:

stores:
relational:
store_type: postgres
host: localhost
dbname: retrico

Elasticsearch

Full-text search with advanced relevance scoring.

Builder API:

builder.chunk_store(type="elasticsearch", url="http://localhost:9200")

YAML:

stores:
relational:
store_type: elasticsearch
url: "http://localhost:9200"

Store Pool

The store pool manages shared, named database connections across the pipeline. Configure stores once at the builder level, and all components inherit them automatically. Connections are created lazily (on first access) and shared — calling the same named store from multiple processors returns the same instance.

Basic Usage

builder = retrico.RetriCoBuilder(name="my_pipeline")

# Register named stores (shared across all processors)
builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687", password="pass"), name="main")
builder.vector_store(retrico.FaissVectorConfig(use_gpu=True), name="embeddings")
builder.chunk_store(retrico.SqliteRelationalConfig(path="chunks.db"), name="chunks")

# Pipeline nodes — no need to repeat connection details
builder.chunker(method="sentence")
builder.ner_gliner(labels=["person", "org"])
builder.graph_writer() # uses "main" graph store + "chunks" relational store
builder.chunk_embedder() # uses "main" graph store + "embeddings" vector store

# Context manager auto-closes all connections
with builder.build() as executor:
result = executor.run(texts=[...])

Multiple Stores of the Same Type

You can register multiple stores of the same category (e.g. two graph databases) and reference them by name from individual processors using the graph_store_name config parameter:

builder = retrico.RetriCoBuilder(name="multi_graph")

# Two graph databases
builder.graph_store(retrico.Neo4jConfig(uri="bolt://prod:7687", password="pass"), name="production")
builder.graph_store(retrico.FalkorDBConfig(host="localhost", port=6379), name="staging")

builder.chunker(method="sentence")
builder.ner_gliner(labels=["person", "location"])

# Write to staging
builder.graph_writer(graph_store_name="staging")

with builder.build() as executor:
result = executor.run(texts=[...])

The graph_store_name parameter tells a processor which named store to use. Without it, the first (or only) registered store is used. The same pattern works for vector stores (vector_store_name) and relational stores (relational_store_name).

Vector stores can also reference a named graph store — useful when storing embeddings directly in graph database nodes:

builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687"), name="main")
builder.vector_store(retrico.GraphDBVectorConfig(graph_store_name="main"))

YAML Config

In YAML, the stores section uses a dict of named stores per category:

name: multi_store_pipeline
stores:
graph:
production:
store_type: neo4j
neo4j_uri: bolt://prod:7687
neo4j_password: password
staging:
store_type: falkordb
falkordb_host: localhost
falkordb_port: 6379
vector:
default:
vector_store_type: faiss
use_gpu: true

nodes:
- id: writer
processor: graph_writer
config:
graph_store_name: staging
- id: embedder
processor: chunk_embedder
config:
graph_store_name: production
vector_store_name: default

The ProcessorFactory detects the stores key, creates a shared pool, and injects it into all processor configs. Processors without an explicit graph_store_name use the first registered store.

StorePool Directly

For advanced use cases, create and manage a StorePool directly:

from retrico import StorePool

pool = StorePool()
pool.register_graph("main", {"store_type": "neo4j", "neo4j_uri": "bolt://localhost:7687"})
pool.register_graph("backup", {"store_type": "falkordb", "falkordb_host": "localhost"})
pool.register_vector("embeddings", {"vector_store_type": "faiss", "use_gpu": True})

# Lazy creation — connection is opened on first access
store = pool.get_graph("main") # creates Neo4j connection
store2 = pool.get_graph("main") # returns the same instance
backup = pool.get_graph("backup") # creates FalkorDB connection

pool.close() # closes all instantiated connections

Backward Compatibility

All existing configs and calling patterns continue to work unchanged:

  • Configs without a stores section — processors fall back to creating their own connections from flat parameters (e.g. neo4j_uri, store_type)
  • A single unnamed store — equivalent to registering it with the name "default"

Direct Store Queries

All graph stores support read queries outside of a pipeline:

from retrico import Neo4jGraphStore, FalkorDBGraphStore, MemgraphGraphStore

store = Neo4jGraphStore(uri="bolt://localhost:7687", password="password")

entity = store.get_entity_by_label("Albert Einstein")
entity = store.get_entity_by_id("Q937") # by ID (useful for linked entities)
relations = store.get_entity_relations(entity["id"])
neighbors = store.get_entity_neighbors(entity["id"], max_hops=2)
chunks = store.get_chunks_for_entity(entity["id"])
subgraph = store.get_subgraph(entity_ids=[entity["id"]], max_hops=1)
all_entities = store.get_all_entities()

store.close()

Creating Stores Programmatically

Use factory functions to create stores from flat dicts or config objects:

from retrico import create_graph_store, create_vector_store, create_relational_store

graph = create_graph_store({"store_type": "neo4j", "neo4j_uri": "bolt://localhost:7687"})
vector = create_vector_store({"vector_store_type": "faiss", "use_gpu": True})
relational = create_relational_store({"relational_store_type": "sqlite", "sqlite_path": "chunks.db"})

Graph Mutations

All graph stores support CRUD operations for surgical changes without raw Cypher:

store = retrico.Neo4jGraphStore(uri="bolt://localhost:7687", password="password")

# Add entities
einstein_id = store.add_entity("Albert Einstein", "person", properties={"birth_year": 1879})
ulm_id = store.add_entity("Ulm", "location")

# Add a relation (validates both entities exist)
rel_id = store.add_relation(einstein_id, ulm_id, "born in")

# Add a relation with temporal properties
rel_id = store.add_relation(einstein_id, ulm_id, "lived in",
start_date="1879-03-14", end_date="1880-06-01")

# Update an entity — only provided fields change, properties are merged
store.update_entity(einstein_id, properties={"death_year": 1955})

# Delete
store.delete_relation(rel_id)
store.delete_entity(ulm_id)
store.delete_chunk("chunk-123")

# Merge two entities — moves all relationships to target, deletes source
store.merge_entities(source_id="e-duplicate", target_id="e-canonical")

store.close()

Mutation methods:

MethodSignatureReturns
add_entity(label, entity_type, *, properties, id)str (UUID)
add_relation(head_id, tail_id, relation_type, *, properties, id, start_date, end_date)str (UUID)
update_entity(entity_id, *, label, entity_type, properties)bool
delete_entity(entity_id)bool
delete_relation(relation_id)bool
delete_chunk(chunk_id)bool
merge_entities(source_id, target_id)bool

Relation Properties and Temporal Filtering

Relations support two first-class temporal fields — start_date and end_date — plus an arbitrary properties dict. These are stored directly on the relation edge in the graph database.

Adding temporal properties

# Via data ingest
retrico.ingest_data(data=[{
"entities": [
{"text": "Einstein", "label": "person"},
{"text": "ETH Zurich", "label": "organization"},
],
"relations": [{
"head": "Einstein", "tail": "ETH Zurich", "type": "worked_at",
"start_date": "1912-01-01", "end_date": "1914-03-01",
"properties": {"role": "professor"},
}],
}])

# Via graph store API
rel_id = store.add_relation(
einstein_id, eth_id, "worked_at",
start_date="1912-01-01", end_date="1914-03-01",
properties={"role": "professor"},
)

Dates should be ISO 8601 strings (e.g. "1879-03-14", "2024-01"). Use None/null when unknown.

Temporal filtering at query time

Retrievers can filter relations by date range using active_after and active_before:

builder = retrico.RetriCoSearch(name="temporal_query")
builder.query_parser(method="gliner", labels=["person", "organization"])
builder.retriever(
max_hops=2,
active_after="2020-01-01",
active_before="2020-12-31",
)
builder.reasoner(api_key="...", model="gpt-4o-mini")
# YAML config equivalent
- id: retriever
processor: retriever
config:
max_hops: 2
active_after: "2020-01-01"
active_before: "2020-12-31"

The filtering logic:

  • active_after: keeps relations where end_date IS NULL OR end_date >= active_after
  • active_before: keeps relations where start_date IS NULL OR start_date <= active_before
  • Relations without dates are always included (treated as always active)

The tool_retriever also supports temporal filtering — the LLM can pass start_date/end_date to tools like get_entity_relations dynamically.


Custom Stores

Register your own graph, vector, or relational store backends. Once registered, they work everywhere — builders, YAML configs, convenience functions, and the store pool.

Custom Graph Store

Implement BaseGraphStore and register it:

from retrico.store.graph.base import BaseGraphStore

class TigerGraphStore(BaseGraphStore):
def __init__(self, host="localhost", port=9000, graph="MyGraph", token=None):
self._host = host
self._port = port
self._graph = graph
self._token = token
self._conn = None

def setup_indexes(self): ...
def close(self): ...
def write_entity(self, entity): ...
def write_relation(self, relation, head_entity_id, tail_entity_id): ...
def get_entity_by_label(self, label): ...
def get_entity_by_id(self, entity_id): ...
def get_entity_neighbors(self, entity_id, max_hops=1): ...
def get_entity_relations(self, entity_id): ...
def get_chunks_for_entity(self, entity_id): ...
def get_subgraph(self, entity_ids, max_hops=1): ...
# ... implement remaining abstract methods

Register it:

import retrico

def create_tigergraph(config):
from my_store import TigerGraphStore
return TigerGraphStore(
host=config.get("tigergraph_host", "localhost"),
port=config.get("tigergraph_port", 9000),
graph=config.get("tigergraph_graph", "MyGraph"),
token=config.get("tigergraph_token"),
)

retrico.register_graph_store("tigergraph", create_tigergraph)

Now store_type="tigergraph" works across all APIs:

result = retrico.build_graph(
texts=[...], entity_labels=[...],
store_type="tigergraph",
tigergraph_host="localhost",
)

Custom Vector Store

from retrico.store.vector.base import BaseVectorStore

class PineconeVectorStore(BaseVectorStore):
def create_index(self, name, dimension): ...
def store_embeddings(self, index_name, items): ...
def search_similar(self, index_name, query_vector, top_k=10): ...

retrico.register_vector_store("pinecone", lambda config: PineconeVectorStore(...))

Custom Processor

Register custom pipeline processors using category-specific registries:

import retrico
from retrico.core.base import BaseProcessor

class SpacyNERProcessor(BaseProcessor):
def __call__(self, chunks, **kwargs):
# Your NER logic here
return {"entities": all_entities, "chunks": chunks}

retrico.register_construct_processor("ner_spacy", lambda config, pipeline=None: SpacyNERProcessor(config, pipeline))

Once registered, use it in builders and YAML:

builder.add_node(
id="ner", processor="ner_spacy",
config={"model": "en_core_web_sm", "labels": ["person", "org"]},
inputs={"chunks": "chunker_result.chunks"},
output="ner_result",
)

Available registries:

RegistryCategoryConvenience function
construct_registryBuild pipeline (chunker, NER, relex, ...)retrico.register_construct_processor()
query_registryQuery pipeline (parser, retrievers, ...)retrico.register_query_processor()
modeling_registryKG modeling (community, KG training, ...)retrico.register_modeling_processor()