Pretrained Models
This page provides detailed information about the models used in GLiNKER entity linking pipelines. GLiNKER uses separate models for different pipeline layers: NER (L1), Linking/Disambiguation (L3), and Reranking (L4).
NER Models (L1)
NER models detect entity mentions in the input text. GLiNKER uses GLiNER bi-encoder models for zero-shot NER with support for arbitrary entity types. BiEncoder variants allow precomputing label embeddings once and reusing them across millions of documents, delivering significant speedups for large-scale linking.
| Name | Parameters | Text Encoder | Label Encoder | Avg. CrossNER▼ | Speed H100 (ex/s) | Speed Pre-computed (ex/s) |
|---|---|---|---|---|---|---|
| gliner-bi-large-v2.0 | 530M | ettin-encoder-400m | bge-base-en-v1.5 | 61.5% | 2.68 | 3.6 |
| gliner-bi-base-v2.0 | 194M | ettin-encoder-150m | bge-small-en-v1.5 | 60.3% | 5.91 | 9.51 |
| gliner-bi-small-v2.0 | 108M | ettin-encoder-68m | all-MiniLM-L12-v2 | 57.2% | 7.99 | 15.22 |
| gliner-bi-edge-v2.0 | 60M | ettin-encoder-32m | all-MiniLM-L6-v2 | 54% | 13.64 | 24.62 |
Basic Usage
from gliner import GLiNER
model = GLiNER.from_pretrained("knowledgator/gliner-bi-base-v2.0")
text = """
Cristiano Ronaldo dos Santos Aveiro was born on 5 February 1985 in Portugal.
He plays for Al Nassr and has won five Ballon d'Or awards.
"""
labels = ["person", "award", "date", "teams", "location"]
entities = model.predict_entities(text, labels, threshold=0.3)
for entity in entities:
print(entity["text"], "=>", entity["label"])
Expected Output
Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Portugal => location
Al Nassr => teams
Ballon d'Or => award
Pre-compute Labels Embeddings
labels = ["person", "award", "date", "teams", "location"]
entity_embeddings = model.encode_labels(labels, batch_size=8)
output = model.batch_predict_with_embeds([text], entity_embeddings, labels)
for entities in output:
for entity in entities:
print(entity["text"], "=>", entity["label"])
Expected Output
Encoding labels: 100%|██████████| 1/1 [00:00<00:00, 2.51it/s]
Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Portugal => location
Al Nassr => teams
Ballon d'Or => award
Linking Models (L3)
Linking models perform entity disambiguation by computing similarity between mention contexts and candidate entity descriptions. These models use a cross-encoder architecture based on DeBERTa.
| Model▲ | Base Encoder | Use Case |
|---|---|---|
| gliner-linker-base-v1.0 | deberta-base | Balanced performance |
| gliner-linker-large-v1.0 | deberta-large | Maximum accuracy |
Usage in Pipeline
from glinker import ConfigBuilder, DAGExecutor
builder = ConfigBuilder(name="demo")
builder.l1.spacy(model="en_core_web_sm")
builder.l3.configure(model="knowledgator/gliner-linker-large-v1.0")
executor = DAGExecutor(builder.get_config())
executor.load_entities("data/entities.jsonl", target_layers=["dict"])
result = executor.execute({
"texts": ["Farnese Palace is one of the most important palaces in the city of Rome."]
})
l0_result = result.get("l0_result")
for entity in l0_result.entities:
if entity.linked_entity:
print(f"{entity.mention_text} → {entity.linked_entity.label}")
print(f" Confidence: {entity.linked_entity.score:.3f}")
Reranking Models (L4)
When the candidate set from L2 is large, a single GLiNER call may be impractical. The L4 reranker splits candidates into chunks of max_labels and runs inference on each chunk, then merges and deduplicates results.
| Model▲ | Base Encoder | Use Case |
|---|---|---|
| gliner-linker-rerank-v1.0 | ettin-encoder-68m | Reranking |
Usage in Pipeline
from glinker import ProcessorFactory
executor = ProcessorFactory.create_simple(
model_name="knowledgator/gliner-bi-base-v2.0",
threshold=0.5,
reranker_model="knowledgator/gliner-linker-rerank-v1.0",
reranker_max_labels=20,
reranker_threshold=0.3,
entities="data/entities.jsonl",
precompute_embeddings=True,
)
result = executor.execute({
"texts": ["BRCA1 mutations are linked to breast cancer susceptibility."]
})
Model Selection Guide
| Use Case | NER Model | Linker Model | Reranker |
|---|---|---|---|
| Edge / Mobile | gliner-bi-edge-v2.0 | gliner-linker-base-v1.0 | — |
| Balanced | gliner-bi-small-v2.0 | gliner-linker-base-v1.0 | — |
| High Accuracy | gliner-bi-base-v2.0 | gliner-linker-large-v1.0 | Optional |
| Maximum Accuracy | gliner-bi-large-v2.0 | gliner-linker-large-v1.0 | gliner-linker-rerank-v1.0 |
For most use cases, the balanced configuration (gliner-bi-small-v2.0 + gliner-linker-base-v1.0) provides the best trade-off between speed and accuracy. Add the reranker only when disambiguation accuracy is critical and the candidate set is large.