How to Link Medical Entities to Hospital Databases

Connect extracted medical entities to your hospital's internal databases and terminology systems for consistent coding and record matching.

Overview

This cookbook demonstrates using entity linking to disambiguate medical terms extracted from clinical text and map them to entries in your hospital's master data—including patient records, medication formularies, and procedure catalogs.

What You'll Learn

Set up entity linking with custom medical knowledge bases
Handle medical synonyms and abbreviations
Link to internal hospital databases
Resolve ambiguous mentions using clinical context
Deduplicate records by matching against existing entries

Prerequisites

Python 3.8+
Access to hospital knowledge bases or terminology servers
Sample clinical text with entities to link

Use Cases

EHR data normalization
Medication reconciliation
Clinical record deduplication
Clinical research data harmonization

The GLinker Pipeline Approach

GLinker provides a layered pipeline architecture for entity extraction and linking. Each layer handles a specific task in the entity resolution process:

Layer	Purpose	Description
L1	Mention extraction	Zero-shot NER using GLiNER
L2	Candidate retrieval	Dictionary lookup with exact and fuzzy matching
L3	Entity disambiguation	Entity linking via GLiNER linker model
L0	Aggregation	Filtering, confidence thresholds, and final output

Models Used

L1 NER: knowledgator/gliner-bi-edge-v2.0 — Zero-shot entity extraction
L3 Linking: knowledgator/gliner-linker-base-v1.0 — Entity disambiguation

Setting Up Your Database

First, create a knowledge base file in JSONL format. Each record should include:

A unique identifier
The canonical entity name
Entity type
Aliases for name variation handling

Mock Data Format

Each line in the JSONL file represents a single entity record with the following structure:

Field	Type	Description
`entity_id`	string	Unique identifier for the entity (e.g., `P001` for patients, `D001` for doctors)
`label`	string	Canonical/official name of the entity
`type`	string	Entity category matching your NER labels (e.g., `Patient`, `Doctor`, `Disease`)
`aliases`	array	List of alternative names, abbreviations, and common misspellings

Example mock_db.jsonl:

mock_data = """{"entity_id": "P001", "label": "John Doe", "type": "Patient", "aliases": ["Jon Doe", "John H Doe"]}
{"entity_id": "P002", "label": "Sarah Connor", "type": "Patient", "aliases": ["S. Connor"]}
{"entity_id": "D001", "label": "Dr. Gregory House", "type": "Doctor", "aliases": ["Dr. House", "Gregory House"]}
{"entity_id": "D002", "label": "Dr. Stephen Strange", "type": "Doctor", "aliases": ["Dr. Strange"]}
{"entity_id": "C001", "label": "Diabetes Mellitus", "type": "Disease", "aliases": ["diabetes", "DM"]}
{"entity_id": "C002", "label": "Hypertension", "type": "Disease", "aliases": ["high blood pressure", "HTN"]}
"""

with open("mock_db.jsonl", "w", encoding="utf-8") as f:
    f.write(mock_data)

print("mock_db.jsonl created successfully!")

Database Contents by Entity Type

Type	ID	Canonical Name	Aliases
Patient	P001	John Doe	Jon Doe, John H Doe
Patient	P002	Sarah Connor	S. Connor
Doctor	D001	Dr. Gregory House	Dr. House, Gregory House
Doctor	D002	Dr. Stephen Strange	Dr. Strange
Disease	C001	Diabetes Mellitus	diabetes, DM
Disease	C002	Hypertension	high blood pressure, HTN

Alias Best Practices

Include these variations in your aliases:

Typos: Common misspellings (Jon for John)
Abbreviations: Standard medical abbreviations (DM for Diabetes Mellitus, HTN for Hypertension)
Informal names: Colloquial terms (high blood pressure for Hypertension)
Name variations: Middle initials, titles, shortened forms (Dr. House for Dr. Gregory House)

Installation

pip install glinker

Building the Pipeline

Step 1: Import and Initialize

from glinker import ConfigBuilder, DAGExecutor, DAGPipeline

Step 2: Configure the Pipeline

Use ConfigBuilder to define all four layers in a single configuration:

builder = ConfigBuilder(name="clinical_db_pipeline")

# Set schema template to use only labels (not descriptions) for L3 matching
builder.set_schema_template("{label}")

# L1: Zero-Shot NER — extract mentions with custom medical entity types
builder.l1.gliner(
    model="knowledgator/gliner-bi-edge-v2.0",
    labels=["Patient", "Doctor", "Disease", "Symptom"],
    threshold=0.2
)

# L2: Dictionary Lookup — candidate generation with exact and fuzzy matching
builder.l2.add(
    "dict",
    priority=0,
    search_mode=["exact", "fuzzy"],
    fuzzy={"max_distance": 2, "min_similarity": 0.6}
)

# L3: Entity Linking — disambiguate candidates using context
builder.l3.configure(
    model="knowledgator/gliner-linker-base-v1.0",
    threshold=0.3,
    device="cpu",
    max_length=512
)

# L0: Aggregation — filter results and include unlinked entities
builder.l0.configure(
    min_confidence=0.4,
    include_unlinked=True  # Include unlinked entities to detect new records
)

Step 3: Build and Load the Database

config = builder.get_config()
pipeline = DAGPipeline(**config)
executor = DAGExecutor(pipeline)

# Load existing hospital records into the pipeline
MOCK_DB_PATH = "mock_db.jsonl"
executor.load_entities(MOCK_DB_PATH, target_layers=['dict'])

Step 4: Process Clinical Notes

note = "Dr. House checked patient Jon Doe who complained of high blood pressure."

context = executor.execute({"texts": [note]})
results = context.data.get('l0_result')

if results and results.entities:
    entities = results.entities[0]

    for ent in entities:
        entity_text = ent.mention_text
        entity_type = ent.label

        if ent.is_linked:
            link = ent.linked_entity
            eid = link.entity_id
            print(f"[EXISTING] Matched '{entity_text}' ({entity_type}) -> ID: {eid} ({link.label})")
            print(f"    -> Action: SKIP INSERTION (Record exists)")
        else:
            print(f"[NEW RECORD] '{entity_text}' ({entity_type}) -> Insert into Database?")
            table = "DOCTORS" if entity_type == "Doctor" else "PATIENTS" if entity_type == "Patient" else "DISEASES"
            print(f"    -> Action: INSERT into {table} table")

Expected Output:

[EXISTING] Matched 'Dr. House' (Doctor) -> ID: D001 (Dr. Gregory House)
    -> Action: SKIP INSERTION (Record exists)
[EXISTING] Matched 'Jon Doe' (Patient) -> ID: P001 (John Doe)
    -> Action: SKIP INSERTION (Record exists)
[EXISTING] Matched 'high blood pressure' (Disease) -> ID: C002 (Hypertension)
    -> Action: SKIP INSERTION (Record exists)

Step 5: Detect New Entities

Process a note containing entities not in the database:

note = "Referral: Dr. Meredith Grey examining new patient Jane Smith for possible Arrhythmia."

context = executor.execute({"texts": [note]})
results = context.data.get('l0_result')

if results and results.entities:
    entities = results.entities[0]

    for ent in entities:
        entity_text = ent.mention_text
        entity_type = ent.label

        if ent.is_linked:
            link = ent.linked_entity
            eid = link.entity_id
            print(f"[EXISTING] Matched '{entity_text}' ({entity_type}) -> ID: {eid} ({link.label})")
        else:
            print(f"[NEW RECORD] '{entity_text}' ({entity_type}) -> Insert into Database?")

Expected Output:

[NEW RECORD] 'Dr. Meredith Grey' (Doctor) -> Insert into Database?
[NEW RECORD] 'Jane Smith' (Patient) -> Insert into Database?
[NEW RECORD] 'Arrhythmia' (Disease) -> Insert into Database?

Step 6: Handle Name Variations

The pipeline handles name variations through aliases, fuzzy matching, and contextual disambiguation:

note = "Patient John H Doe returned for follow-up appointment."

context = executor.execute({"texts": [note]})
results = context.data.get('l0_result')

if results and results.entities:
    entities = results.entities[0]

    for ent in entities:
        entity_text = ent.mention_text
        entity_type = ent.label

        # Skip generic entity type names extracted as mentions
        if entity_text.lower() == entity_type.lower():
            continue

        if ent.is_linked:
            link = ent.linked_entity
            print(f"[EXISTING] Matched '{entity_text}' -> ID: {link.entity_id} ({link.label})")
        else:
            print(f"[NEW RECORD] '{entity_text}' ({entity_type})")

Expected Output:

[EXISTING] Matched 'John H Doe' -> ID: P001 (John Doe)

Key Features

Zero-Shot Learning

GLinker uses GLiNER for zero-shot entity extraction, meaning you can define custom entity types without training data:

# Define any entity types relevant to your use case
builder.l1.gliner(
    model="knowledgator/gliner-bi-edge-v2.0",
    labels=["Patient", "Doctor", "Disease", "Symptom", "Medication", "Procedure"],
    threshold=0.2
)

Name Variation Handling

The pipeline automatically handles common name variations through multiple mechanisms:

Input Mention	Resolved Entity	Method
"Jon Doe"	John Doe	Database alias
"John H Doe"	John Doe	Database alias
"Dr. House"	Dr. Gregory House	Database alias
"high blood pressure"	Hypertension	L2 fuzzy + L3 linking

Deduplication Logic

The pipeline distinguishes between existing and new entities using is_linked:

for ent in entities:
    if ent.is_linked:
        # Entity found in database — skip insertion, use existing ID
        db_id = ent.linked_entity.entity_id
        print(f"Using existing record: {db_id}")
    else:
        # New entity — flag for insertion
        print(f"New entity detected: {ent.mention_text}")

Best Practices

Maintain comprehensive aliases: Add common misspellings, abbreviations, and variations to your database aliases
Set appropriate confidence thresholds: Lower thresholds catch more matches but may introduce false positives
Review new entities regularly: Entities flagged as "new" should be reviewed before database insertion
Use context for disambiguation: When multiple candidates match, L3 uses surrounding text to disambiguate
Enable include_unlinked: Set this to True in L0 to detect new records that need to be added to your database

Next Steps

Biomedical Entity Extraction — Extract biomedical entities from clinical text with GLiNER
Adverse Drug Event Detection — Detect ADEs in medical text with GLiClass
PII Detection and Redaction — Protect patient privacy with GLiNER

Overview​

What You'll Learn​

Prerequisites​

Use Cases​

The GLinker Pipeline Approach​

Models Used​

Setting Up Your Database​

Mock Data Format​

Database Contents by Entity Type​

Installation​

Building the Pipeline​

Step 1: Import and Initialize​

Step 2: Configure the Pipeline​

Step 3: Build and Load the Database​

Step 4: Process Clinical Notes​

Step 5: Detect New Entities​

Step 6: Handle Name Variations​

Key Features​

Zero-Shot Learning​

Name Variation Handling​

Deduplication Logic​

Best Practices​

Next Steps​

Overview

What You'll Learn

Prerequisites

Use Cases

The GLinker Pipeline Approach

Models Used

Setting Up Your Database

Mock Data Format

Database Contents by Entity Type

Installation

Building the Pipeline

Step 1: Import and Initialize

Step 2: Configure the Pipeline

Step 3: Build and Load the Database

Step 4: Process Clinical Notes

Step 5: Detect New Entities

Step 6: Handle Name Variations

Key Features

Zero-Shot Learning

Name Variation Handling

Deduplication Logic

Best Practices

Next Steps