Knowledgator Docs
GitHubDiscord
  • 🛎️Welcome
  • ⚙️Models
    • 🧮Comprehend-it
      • Comprehend_it-base
      • Comprehend_it-multilingual-t5-base
    • 🦎UTC
  • 👷Frameworks
    • 💧LiqFit
      • Quick Start
      • Benchmarks
      • API Reference
        • Collators
          • NLICollator
          • Creating custom collator
        • Datasets
          • NLIDataset
        • Losses
          • Focal Loss
          • Binary Cross Entropy
          • Cross Entropy Loss
        • Modeling
          • LiqFitBackbone
          • LiqFitModel
        • Downstream Heads
          • LiqFitHead
          • LabelClassificationHead
          • ClassClassificationHead
          • ClassificationHead
        • Pooling
          • GlobalMaxPooling1D
          • GlobalAbsAvgPooling1D
          • GlobalAbsMaxPooling1D
          • GlobalRMSPooling1D
          • GlobalSumPooling1D
          • GlobalAvgPooling1D
          • FirstTokenPooling1D
        • Models
          • Deberta
          • T5
        • Pipelines
          • ZeroShotClassificationPipeline
  • 📚Datasets
    • Biotech news dataset
  • 👩‍🔧Support
  • API Reference
    • Comprehend-it API
    • Entity extraction
      • /fast
      • /deterministic
      • /advanced
    • Token searcher
    • Web2Meaning
    • Web2Meaning2
    • Relation extraction
    • Text2Table
      • /web2text
      • /text_preprocessing
      • /text2table
      • /merge_tables
Powered by GitBook
On this page
  • liqfit.datasets.NLIDataset.load_dataset
  • Using NLIDataset
  1. Frameworks
  2. LiqFit
  3. API Reference
  4. Datasets

NLIDataset

class liqfit.datasets.NLIDataset

(hypothesis: List[str], premises: List[str], labels: List[int])

Parameters:

  • hypothesis (List[str]): List of string sequences.

  • premises: (List[str]): List of string sequences.

  • labels (List[int]): List of labels as integers.

liqfit.datasets.NLIDataset.load_dataset

(dataset: Optional[Dataset] = None, 
dataset_name: Optional[str] = None,
classes: Optional[List[str]] = None, 
text_column: Optional[str] = "text",
label_column: Optional[str] = "label",
template: Optional[str] = "This example is {}.",
normalize_negatives: bool = False, 
positives: int = 1, 
negatives: int = -1,
multi_label: bool = False)

Parameters:

  • dataset (Optional[Dataset]): Dataset object (if dataset_namenot passed). (Defaults to None).

  • dataset_name (int): Dataset name to load from Hugging Face datasets (if dataset not passed). (Defaults to None).

  • classes (Optional[List[str]]): List of classes available in the dataset. (Defaults to None).

  • text_column (Optional[str]): Column name that contains the text. (Defaults to "text").

  • label_column (Optional[str]): Column name that contains the labels. (Defaults to "label").

  • template (Optional[str]): Template string that will be used to concatenate the label to it. (Defaults to "This example is {}.")

  • normalize_negatives (bool): Whether to normalize negative examples or not (Defaults to False).

  • positives (int): Positives label ID. (Defaults to 1).

  • negatives (int): Positives label ID. (Defaults to -1).

  • multi_label (bool): Whether each example has more than one label or not. (Defaults to False).

Using NLIDataset

from liqfit.datasets import NLIDataset
from datasets import load_dataset

nli_dataset = load_dataset("your/nli_dataset")

dataset = NLIDataset(
  hypothesis = nli_dataset['hypothesis'],
  premises = nli_dataset['premises'],
  labels = nli_dataset['labels'] # labels expected to be encoded.
)
                     
# OR

dataset = NLIDataset.load_dataset(nli_dataset, classes=["happiness", "sadness", ...])
PreviousDatasetsNextLosses

Last updated 1 year ago

👷
💧