Databases

RetriCo supports three categories of databases: graph, vector, and relational. Each category has multiple backends that you can mix and match.

Graph Databases

Graph databases store the knowledge graph — entities, relations, chunks, and documents.

FalkorDB Lite (Default)

Embedded, zero-configuration graph database. No server needed — data is stored locally.

One-liner:

import retrico

# Automatic — FalkorDB Lite is the default
result = retrico.build_graph(texts=[...], entity_labels=[...])

# Explicit
result = retrico.build_graph(
    texts=[...],
    entity_labels=[...],
    store_config=retrico.FalkorDBLiteConfig(),
)

Builder API:

builder = retrico.RetriCoBuilder(name="my_pipeline")
builder.graph_store(retrico.FalkorDBLiteConfig())

YAML:

stores:
  graph:
    store_type: falkordb_lite

Neo4j

Production-grade graph database with a rich query language (Cypher), built-in visualization, and enterprise features.

# Start Neo4j (Docker)
docker run -d --name neo4j -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password neo4j:latest

One-liner:

config = retrico.Neo4jConfig(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password",
    database="neo4j",  # optional
)

result = retrico.build_graph(
    texts=[...],
    entity_labels=[...],
    store_config=config,
)

Builder API:

builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687", password="password"))

YAML:

stores:
  graph:
    store_type: neo4j
    uri: "bolt://localhost:7687"
    user: neo4j
    password: password
    database: neo4j

Neo4j Config Parameters:

Parameter	Default	Description
`uri`	`"bolt://localhost:7687"`	Bolt protocol URI
`user`	`"neo4j"`	Username
`password`	`"password"`	Password
`database`	`"neo4j"`	Database name

Direct Queries

You can also query Neo4j directly:

store = retrico.Neo4jGraphStore(uri="bolt://localhost:7687", password="password")

entity = store.get_entity_by_label("Einstein")
relations = store.get_entity_relations(entity["id"])
neighbors = store.get_entity_neighbors(entity["id"], max_hops=2)
chunks = store.get_chunks_for_entity(entity["id"])
subgraph = store.get_subgraph(entity_ids=[entity["id"]], max_hops=1)

store.close()

FalkorDB (Server)

Redis-compatible graph database with Cypher support. Faster than Neo4j for many workloads.

# Start FalkorDB (Docker)
docker run -d --name falkordb -p 6379:6379 falkordb/falkordb:latest

One-liner:

config = retrico.FalkorDBConfig(
    host="localhost",
    port=6379,
    graph="my_graph",
)

result = retrico.build_graph(texts=[...], entity_labels=[...], store_config=config)

Builder API:

builder.graph_store(retrico.FalkorDBConfig(host="localhost", port=6379, graph="my_graph"))

YAML:

stores:
  graph:
    store_type: falkordb
    host: localhost
    port: 6379
    graph: my_graph

FalkorDB Config Parameters:

Parameter	Default	Description
`host`	`"localhost"`	Redis host
`port`	`6379`	Redis port
`graph`	`"retrico"`	Graph name

Memgraph

High-performance, in-memory graph database compatible with the Bolt protocol.

# Start Memgraph (Docker)
docker run -d --name memgraph -p 7687:7687 memgraph/memgraph:latest

One-liner:

config = retrico.MemgraphConfig(
    uri="bolt://localhost:7687",
    user="",
    password="",
    database="memgraph",
)

result = retrico.build_graph(texts=[...], entity_labels=[...], store_config=config)

Builder API:

builder.graph_store(retrico.MemgraphConfig(uri="bolt://localhost:7687"))

YAML:

stores:
  graph:
    store_type: memgraph
    uri: "bolt://localhost:7687"
    database: memgraph

Memgraph uses the same Neo4j Python driver (Bolt protocol), so no additional dependencies are needed.

Vector Databases

Vector stores hold embeddings for semantic search — used by chunk, entity, and community retrieval strategies.

In-Memory (Default)

Simple, zero-config vector store. Good for development and small datasets.

Builder API:

builder.vector_store(type="in_memory")

YAML:

stores:
  vector:
    store_type: in_memory

FAISS

Facebook's high-performance vector similarity search. Supports GPU acceleration.

pip install faiss-cpu  # or faiss-gpu

Builder API:

builder.vector_store(retrico.FaissVectorConfig(use_gpu=False))

YAML:

stores:
  vector:
    store_type: faiss
    use_gpu: false

Parameters:

Parameter	Default	Description
`use_gpu`	`False`	Enable GPU acceleration

Qdrant

Production-ready vector database with filtering and payload support.

pip install qdrant-client

Builder API:

builder.vector_store(retrico.QdrantVectorConfig(
    url="http://localhost:6333",
    collection_name="my_embeddings",
))

YAML:

stores:
  vector:
    store_type: qdrant
    url: "http://localhost:6333"
    collection_name: my_embeddings

Parameters:

Parameter	Default	Description
`url`	`"http://localhost:6333"`	Qdrant server URL
`collection_name`	`"retrico"`	Collection name

Graph DB-Backed

Store embeddings directly in the graph database nodes. Useful when you want a single storage backend.

Builder API:

builder.vector_store(retrico.GraphDBVectorConfig())

YAML:

stores:
  vector:
    store_type: graph_db

Relational Databases

Relational stores hold chunks and documents for full-text search and structured queries.

SQLite

Zero-config local storage with FTS5 full-text search. Built into Python — no external dependencies.

Builder API:

builder.chunk_store(type="sqlite", path="chunks.db")

YAML:

stores:
  relational:
    store_type: sqlite
    path: chunks.db

PostgreSQL

Production use with tsvector for full-text search and optional pgvector for embeddings.

Builder API:

builder.chunk_store(type="postgres", host="localhost", dbname="retrico")

YAML:

stores:
  relational:
    store_type: postgres
    host: localhost
    dbname: retrico

Elasticsearch

Full-text search with advanced relevance scoring.

Builder API:

builder.chunk_store(type="elasticsearch", url="http://localhost:9200")

YAML:

stores:
  relational:
    store_type: elasticsearch
    url: "http://localhost:9200"

Store Pool

The store pool manages shared, named database connections across the pipeline. Configure stores once at the builder level, and all components inherit them automatically. Connections are created lazily (on first access) and shared — calling the same named store from multiple processors returns the same instance.

Basic Usage

builder = retrico.RetriCoBuilder(name="my_pipeline")

# Register named stores (shared across all processors)
builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687", password="pass"), name="main")
builder.vector_store(retrico.FaissVectorConfig(use_gpu=True), name="embeddings")
builder.chunk_store(retrico.SqliteRelationalConfig(path="chunks.db"), name="chunks")

# Pipeline nodes — no need to repeat connection details
builder.chunker(method="sentence")
builder.ner_gliner(labels=["person", "org"])
builder.graph_writer()       # uses "main" graph store + "chunks" relational store
builder.chunk_embedder()     # uses "main" graph store + "embeddings" vector store

# Context manager auto-closes all connections
with builder.build() as executor:
    result = executor.run(texts=[...])

Multiple Stores of the Same Type

You can register multiple stores of the same category (e.g. two graph databases) and reference them by name from individual processors using the graph_store_name config parameter:

builder = retrico.RetriCoBuilder(name="multi_graph")

# Two graph databases
builder.graph_store(retrico.Neo4jConfig(uri="bolt://prod:7687", password="pass"), name="production")
builder.graph_store(retrico.FalkorDBConfig(host="localhost", port=6379), name="staging")

builder.chunker(method="sentence")
builder.ner_gliner(labels=["person", "location"])

# Write to staging
builder.graph_writer(graph_store_name="staging")

with builder.build() as executor:
    result = executor.run(texts=[...])

The graph_store_name parameter tells a processor which named store to use. Without it, the first (or only) registered store is used. The same pattern works for vector stores (vector_store_name) and relational stores (relational_store_name).

Vector stores can also reference a named graph store — useful when storing embeddings directly in graph database nodes:

builder.graph_store(retrico.Neo4jConfig(uri="bolt://localhost:7687"), name="main")
builder.vector_store(retrico.GraphDBVectorConfig(graph_store_name="main"))

YAML Config

In YAML, the stores section uses a dict of named stores per category:

name: multi_store_pipeline
stores:
  graph:
    production:
      store_type: neo4j
      neo4j_uri: bolt://prod:7687
      neo4j_password: password
    staging:
      store_type: falkordb
      falkordb_host: localhost
      falkordb_port: 6379
  vector:
    default:
      vector_store_type: faiss
      use_gpu: true

nodes:
  - id: writer
    processor: graph_writer
    config:
      graph_store_name: staging
  - id: embedder
    processor: chunk_embedder
    config:
      graph_store_name: production
      vector_store_name: default

The ProcessorFactory detects the stores key, creates a shared pool, and injects it into all processor configs. Processors without an explicit graph_store_name use the first registered store.

StorePool Directly

For advanced use cases, create and manage a StorePool directly:

from retrico import StorePool

pool = StorePool()
pool.register_graph("main", {"store_type": "neo4j", "neo4j_uri": "bolt://localhost:7687"})
pool.register_graph("backup", {"store_type": "falkordb", "falkordb_host": "localhost"})
pool.register_vector("embeddings", {"vector_store_type": "faiss", "use_gpu": True})

# Lazy creation — connection is opened on first access
store = pool.get_graph("main")        # creates Neo4j connection
store2 = pool.get_graph("main")       # returns the same instance
backup = pool.get_graph("backup")     # creates FalkorDB connection

pool.close()  # closes all instantiated connections

Backward Compatibility

All existing configs and calling patterns continue to work unchanged:

Configs without a stores section — processors fall back to creating their own connections from flat parameters (e.g. neo4j_uri, store_type)
A single unnamed store — equivalent to registering it with the name "default"

Direct Store Queries

All graph stores support read queries outside of a pipeline:

from retrico import Neo4jGraphStore, FalkorDBGraphStore, MemgraphGraphStore

store = Neo4jGraphStore(uri="bolt://localhost:7687", password="password")

entity = store.get_entity_by_label("Albert Einstein")
entity = store.get_entity_by_id("Q937")  # by ID (useful for linked entities)
relations = store.get_entity_relations(entity["id"])
neighbors = store.get_entity_neighbors(entity["id"], max_hops=2)
chunks = store.get_chunks_for_entity(entity["id"])
subgraph = store.get_subgraph(entity_ids=[entity["id"]], max_hops=1)
all_entities = store.get_all_entities()

store.close()

Creating Stores Programmatically

Use factory functions to create stores from flat dicts or config objects:

from retrico import create_graph_store, create_vector_store, create_relational_store

graph = create_graph_store({"store_type": "neo4j", "neo4j_uri": "bolt://localhost:7687"})
vector = create_vector_store({"vector_store_type": "faiss", "use_gpu": True})
relational = create_relational_store({"relational_store_type": "sqlite", "sqlite_path": "chunks.db"})

Graph Mutations

All graph stores support CRUD operations for surgical changes without raw Cypher:

store = retrico.Neo4jGraphStore(uri="bolt://localhost:7687", password="password")

# Add entities
einstein_id = store.add_entity("Albert Einstein", "person", properties={"birth_year": 1879})
ulm_id = store.add_entity("Ulm", "location")

# Add a relation (validates both entities exist)
rel_id = store.add_relation(einstein_id, ulm_id, "born in")

# Add a relation with temporal properties
rel_id = store.add_relation(einstein_id, ulm_id, "lived in",
                            start_date="1879-03-14", end_date="1880-06-01")

# Update an entity — only provided fields change, properties are merged
store.update_entity(einstein_id, properties={"death_year": 1955})

# Delete
store.delete_relation(rel_id)
store.delete_entity(ulm_id)
store.delete_chunk("chunk-123")

# Merge two entities — moves all relationships to target, deletes source
store.merge_entities(source_id="e-duplicate", target_id="e-canonical")

store.close()

Mutation methods:

Method	Signature	Returns
`add_entity`	`(label, entity_type, *, properties, id)`	`str` (UUID)
`add_relation`	`(head_id, tail_id, relation_type, *, properties, id, start_date, end_date)`	`str` (UUID)
`update_entity`	`(entity_id, *, label, entity_type, properties)`	`bool`
`delete_entity`	`(entity_id)`	`bool`
`delete_relation`	`(relation_id)`	`bool`
`delete_chunk`	`(chunk_id)`	`bool`
`merge_entities`	`(source_id, target_id)`	`bool`

Relation Properties and Temporal Filtering

Relations support two first-class temporal fields — start_date and end_date — plus an arbitrary properties dict. These are stored directly on the relation edge in the graph database.

Adding temporal properties

# Via data ingest
retrico.ingest_data(data=[{
    "entities": [
        {"text": "Einstein", "label": "person"},
        {"text": "ETH Zurich", "label": "organization"},
    ],
    "relations": [{
        "head": "Einstein", "tail": "ETH Zurich", "type": "worked_at",
        "start_date": "1912-01-01", "end_date": "1914-03-01",
        "properties": {"role": "professor"},
    }],
}])

# Via graph store API
rel_id = store.add_relation(
    einstein_id, eth_id, "worked_at",
    start_date="1912-01-01", end_date="1914-03-01",
    properties={"role": "professor"},
)

Dates should be ISO 8601 strings (e.g. "1879-03-14", "2024-01"). Use None/null when unknown.

Temporal filtering at query time

Retrievers can filter relations by date range using active_after and active_before:

builder = retrico.RetriCoSearch(name="temporal_query")
builder.query_parser(method="gliner", labels=["person", "organization"])
builder.retriever(
    max_hops=2,
    active_after="2020-01-01",
    active_before="2020-12-31",
)
builder.reasoner(api_key="...", model="gpt-4o-mini")

# YAML config equivalent
- id: retriever
  processor: retriever
  config:
    max_hops: 2
    active_after: "2020-01-01"
    active_before: "2020-12-31"

The filtering logic:

active_after: keeps relations where end_date IS NULL OR end_date >= active_after
active_before: keeps relations where start_date IS NULL OR start_date <= active_before
Relations without dates are always included (treated as always active)

The tool_retriever also supports temporal filtering — the LLM can pass start_date/end_date to tools like get_entity_relations dynamically.

Custom Stores

Register your own graph, vector, or relational store backends. Once registered, they work everywhere — builders, YAML configs, convenience functions, and the store pool.

Custom Graph Store

Implement BaseGraphStore and register it:

from retrico.store.graph.base import BaseGraphStore

class TigerGraphStore(BaseGraphStore):
    def __init__(self, host="localhost", port=9000, graph="MyGraph", token=None):
        self._host = host
        self._port = port
        self._graph = graph
        self._token = token
        self._conn = None

    def setup_indexes(self): ...
    def close(self): ...
    def write_entity(self, entity): ...
    def write_relation(self, relation, head_entity_id, tail_entity_id): ...
    def get_entity_by_label(self, label): ...
    def get_entity_by_id(self, entity_id): ...
    def get_entity_neighbors(self, entity_id, max_hops=1): ...
    def get_entity_relations(self, entity_id): ...
    def get_chunks_for_entity(self, entity_id): ...
    def get_subgraph(self, entity_ids, max_hops=1): ...
    # ... implement remaining abstract methods

import retrico

def create_tigergraph(config):
    from my_store import TigerGraphStore
    return TigerGraphStore(
        host=config.get("tigergraph_host", "localhost"),
        port=config.get("tigergraph_port", 9000),
        graph=config.get("tigergraph_graph", "MyGraph"),
        token=config.get("tigergraph_token"),
    )

retrico.register_graph_store("tigergraph", create_tigergraph)

Now store_type="tigergraph" works across all APIs:

result = retrico.build_graph(
    texts=[...], entity_labels=[...],
    store_type="tigergraph",
    tigergraph_host="localhost",
)

Custom Vector Store

from retrico.store.vector.base import BaseVectorStore

class PineconeVectorStore(BaseVectorStore):
    def create_index(self, name, dimension): ...
    def store_embeddings(self, index_name, items): ...
    def search_similar(self, index_name, query_vector, top_k=10): ...

retrico.register_vector_store("pinecone", lambda config: PineconeVectorStore(...))

Custom Processor

import retrico
from retrico.core.base import BaseProcessor

class SpacyNERProcessor(BaseProcessor):
    def __call__(self, chunks, **kwargs):
        # Your NER logic here
        return {"entities": all_entities, "chunks": chunks}

retrico.register_construct_processor("ner_spacy", lambda config, pipeline=None: SpacyNERProcessor(config, pipeline))

Once registered, use it in builders and YAML:

builder.add_node(
    id="ner", processor="ner_spacy",
    config={"model": "en_core_web_sm", "labels": ["person", "org"]},
    inputs={"chunks": "chunker_result.chunks"},
    output="ner_result",
)

Available registries:

Registry	Category	Convenience function
`construct_registry`	Build pipeline (chunker, NER, relex, ...)	`retrico.register_construct_processor()`
`query_registry`	Query pipeline (parser, retrievers, ...)	`retrico.register_query_processor()`
`modeling_registry`	KG modeling (community, KG training, ...)	`retrico.register_modeling_processor()`

Graph Databases​

FalkorDB Lite (Default)​

Neo4j​

Direct Queries​

FalkorDB (Server)​

Memgraph​

Vector Databases​

In-Memory (Default)​

FAISS​

Qdrant​

Graph DB-Backed​

Relational Databases​

SQLite​

PostgreSQL​

Elasticsearch​

Store Pool​

Basic Usage​

Multiple Stores of the Same Type​

YAML Config​

StorePool Directly​

Backward Compatibility​

Direct Store Queries​

Creating Stores Programmatically​

Graph Mutations​

Relation Properties and Temporal Filtering​

Adding temporal properties​

Temporal filtering at query time​

Custom Stores​

Custom Graph Store​

Custom Vector Store​

Custom Processor​

Graph Databases

FalkorDB Lite (Default)

Neo4j

Direct Queries

FalkorDB (Server)

Memgraph

Vector Databases

In-Memory (Default)

FAISS

Qdrant

Graph DB-Backed

Relational Databases

SQLite

PostgreSQL

Elasticsearch

Store Pool

Basic Usage

Multiple Stores of the Same Type

YAML Config

StorePool Directly

Backward Compatibility

Direct Store Queries

Creating Stores Programmatically

Graph Mutations

Relation Properties and Temporal Filtering

Adding temporal properties

Temporal filtering at query time

Custom Stores

Custom Graph Store

Custom Vector Store

Custom Processor