Entity Linking
GLiNER-Linker is a family of bi-encoder models for entity disambiguation, developed as the neural component of the GLiNKER framework. These models resolve extracted entity mentions by linking them to the correct entries in a knowledge base, handling ambiguity such as distinguishing "Apple" (the company) from "Apple" (the fruit).
Overview
- Architecture: Bi-encoder (separate text encoder and label encoder sharing the same base model).
- Task: Entity disambiguation / Entity linking.
- Languages Supported: English.
- License: Apache 2.0.
Available Models
Linking Models
Linking models perform entity disambiguation by computing similarity between mention contexts and candidate entity descriptions.
| Model▲ | Base Encoder | Use Case |
|---|---|---|
| gliner-linker-base-v1.0 | deberta-base | Balanced performance |
| gliner-linker-large-v1.0 | deberta-large | Maximum accuracy |
Reranking Model
When the candidate set is large, the reranker splits candidates into chunks and runs inference on each chunk, then merges and deduplicates results for improved accuracy.
| Model▲ | Base Encoder | Use Case |
|---|---|---|
| gliner-linker-rerank-v1.0 | ettin-encoder-68m | Reranking |
Usage
Installation
pip install git+https://github.com/Knowledgator/GLinker.git
Entity Input Format
Entities are provided as JSONL files with the following structure:
{"entity_id": "Q312", "label": "Apple Inc.", "description": "American technology company", "entity_type": "organization"}
{"entity_id": "Q89", "label": "Apple", "description": "Edible fruit of apple tree", "entity_type": "food"}
Basic Entity Linking Pipeline
from glinker import ConfigBuilder, DAGExecutor
# Build pipeline
builder = ConfigBuilder(name="entity_linking")
# L1: Extract mentions
builder.l1.gliner(
model="knowledgator/gliner-bi-base-v2.0",
labels=["person", "organization", "location"]
)
# L2: Candidate retrieval
builder.l2.add("dict", priority=0)
# L3: Disambiguation with GLiNER-Linker
builder.l3.configure(
model="knowledgator/gliner-linker-large-v1.0",
use_precomputed_embeddings=True
)
# Execute
executor = DAGExecutor(builder.get_config())
executor.load_entities("entities.jsonl", target_layers=["dict"])
result = executor.execute({
"texts": ["Apple announced new iPhone"]
})
# Get linked entities
l0_result = result.get("l0_result")
for entity in l0_result.entities:
if entity.linked_entity:
print(f"{entity.mention_text} -> {entity.linked_entity.label}")
print(f" Score: {entity.linked_entity.score:.3f}")
Precomputed Embeddings
Precomputing entity embeddings provides 10-100x speedups for large-scale linking:
builder.l2.embeddings(
enabled=True,
model_name="knowledgator/gliner-linker-large-v1.0"
)
executor.load_entities("entities.jsonl", target_layers=["dict"])
executor.precompute_embeddings(target_layers=["postgres"], batch_size=8)
Pipeline with Reranker
Add the reranker as an L4 stage for improved disambiguation when the candidate set is large:
builder = ConfigBuilder(name="reranked")
builder.l1.gliner(
model="knowledgator/gliner-bi-base-v2.0",
labels=["gene", "disease"]
)
builder.l3.configure(model="knowledgator/gliner-linker-base-v1.0")
builder.l4.configure(
model="knowledgator/gliner-linker-rerank-v1.0",
threshold=0.3,
max_labels=5,
)
Model Selection Guide
| Use Case | Linker Model | Reranker |
|---|---|---|
| Balanced performance | gliner-linker-base-v1.0 | --- |
| Maximum accuracy | gliner-linker-large-v1.0 | Optional |
| Large candidate sets | gliner-linker-large-v1.0 | gliner-linker-rerank-v1.0 |
For detailed pipeline configuration and advanced usage, see the GLiNKER framework documentation.