Knowledgator Docs
GitHubDiscord
  • 🛎️Welcome
  • ⚙️Models
    • 🧮Comprehend-it
      • Comprehend_it-base
      • Comprehend_it-multilingual-t5-base
    • 🦎UTC
  • 👷Frameworks
    • 💧LiqFit
      • Quick Start
      • Benchmarks
      • API Reference
        • Collators
          • NLICollator
          • Creating custom collator
        • Datasets
          • NLIDataset
        • Losses
          • Focal Loss
          • Binary Cross Entropy
          • Cross Entropy Loss
        • Modeling
          • LiqFitBackbone
          • LiqFitModel
        • Downstream Heads
          • LiqFitHead
          • LabelClassificationHead
          • ClassClassificationHead
          • ClassificationHead
        • Pooling
          • GlobalMaxPooling1D
          • GlobalAbsAvgPooling1D
          • GlobalAbsMaxPooling1D
          • GlobalRMSPooling1D
          • GlobalSumPooling1D
          • GlobalAvgPooling1D
          • FirstTokenPooling1D
        • Models
          • Deberta
          • T5
        • Pipelines
          • ZeroShotClassificationPipeline
  • 📚Datasets
    • Biotech news dataset
  • 👩‍🔧Support
  • API Reference
    • Comprehend-it API
    • Entity extraction
      • /fast
      • /deterministic
      • /advanced
    • Token searcher
    • Web2Meaning
    • Web2Meaning2
    • Relation extraction
    • Text2Table
      • /web2text
      • /text_preprocessing
      • /text2table
      • /merge_tables
Powered by GitBook
On this page
  1. Frameworks
  2. LiqFit
  3. API Reference
  4. Collators

NLICollator

Reference API for NLICollator

class liqfit.collators.NLICollator

(tokenizer: AutoTokenizer, max_length: int, padding: Union[bool, str], truncation: bool)

Parameters:

  • tokenizer (AutoTokenizer, Callable): The tokenizer used to process the input data from texts to input IDs.

  • max_length (int): Max length that will be used while tokenizing the input sequences.

  • padding: (Union[bool, str]): Option to specify whether to use pad sequences while tokenization or not.

  • truncation (bool): Option to specify whether to use truncate sequences while tokenization or not.

Using NLICollator

from liqfit.collators import NLICollator
from liqfit.datasets import NLIDataset
from torch.utils.data import DataLoader

dataset = NLIDataset(....)
collator = NLICollator(....)
dataloader = DataLoader(dataset, collate_fn=collator)

# OR

from transformers import Trainer
trainer = Trainer(train_dataset=dataset, data_collator=collator)
```
PreviousCollatorsNextCreating custom collator

Last updated 1 year ago

👷
💧