Skip to main content

Entity Linking

GLiNER-Linker is a family of bi-encoder models for entity disambiguation, developed as the neural component of the GLiNKER framework. These models resolve extracted entity mentions by linking them to the correct entries in a knowledge base, handling ambiguity such as distinguishing "Apple" (the company) from "Apple" (the fruit).


Overview

  • Architecture: Bi-encoder (separate text encoder and label encoder sharing the same base model).
  • Task: Entity disambiguation / Entity linking.
  • Languages Supported: English.
  • License: Apache 2.0.

Available Models

Linking Models

Linking models perform entity disambiguation by computing similarity between mention contexts and candidate entity descriptions.

ModelBase EncoderUse Case
gliner-linker-base-v1.0deberta-baseBalanced performance
gliner-linker-large-v1.0deberta-largeMaximum accuracy

Reranking Model

When the candidate set is large, the reranker splits candidates into chunks and runs inference on each chunk, then merges and deduplicates results for improved accuracy.

ModelBase EncoderUse Case
gliner-linker-rerank-v1.0ettin-encoder-68mReranking

Usage

Installation

pip install git+https://github.com/Knowledgator/GLinker.git

Entity Input Format

Entities are provided as JSONL files with the following structure:

{"entity_id": "Q312", "label": "Apple Inc.", "description": "American technology company", "entity_type": "organization"}
{"entity_id": "Q89", "label": "Apple", "description": "Edible fruit of apple tree", "entity_type": "food"}

Basic Entity Linking Pipeline

from glinker import ConfigBuilder, DAGExecutor

# Build pipeline
builder = ConfigBuilder(name="entity_linking")

# L1: Extract mentions
builder.l1.gliner(
model="knowledgator/gliner-bi-base-v2.0",
labels=["person", "organization", "location"]
)

# L2: Candidate retrieval
builder.l2.add("dict", priority=0)

# L3: Disambiguation with GLiNER-Linker
builder.l3.configure(
model="knowledgator/gliner-linker-large-v1.0",
use_precomputed_embeddings=True
)

# Execute
executor = DAGExecutor(builder.get_config())
executor.load_entities("entities.jsonl", target_layers=["dict"])

result = executor.execute({
"texts": ["Apple announced new iPhone"]
})

# Get linked entities
l0_result = result.get("l0_result")
for entity in l0_result.entities:
if entity.linked_entity:
print(f"{entity.mention_text} -> {entity.linked_entity.label}")
print(f" Score: {entity.linked_entity.score:.3f}")

Precomputed Embeddings

Precomputing entity embeddings provides 10-100x speedups for large-scale linking:

builder.l2.embeddings(
enabled=True,
model_name="knowledgator/gliner-linker-large-v1.0"
)

executor.load_entities("entities.jsonl", target_layers=["dict"])
executor.precompute_embeddings(target_layers=["postgres"], batch_size=8)

Pipeline with Reranker

Add the reranker as an L4 stage for improved disambiguation when the candidate set is large:

builder = ConfigBuilder(name="reranked")
builder.l1.gliner(
model="knowledgator/gliner-bi-base-v2.0",
labels=["gene", "disease"]
)
builder.l3.configure(model="knowledgator/gliner-linker-base-v1.0")
builder.l4.configure(
model="knowledgator/gliner-linker-rerank-v1.0",
threshold=0.3,
max_labels=5,
)

Model Selection Guide

Use CaseLinker ModelReranker
Balanced performancegliner-linker-base-v1.0---
Maximum accuracygliner-linker-large-v1.0Optional
Large candidate setsgliner-linker-large-v1.0gliner-linker-rerank-v1.0

For detailed pipeline configuration and advanced usage, see the GLiNKER framework documentation.