NLIDataset

class liqfit.datasets.NLIDataset

(hypothesis: List[str], premises: List[str], labels: List[int])

Parameters:

  • hypothesis (List[str]): List of string sequences.

  • premises: (List[str]): List of string sequences.

  • labels (List[int]): List of labels as integers.

liqfit.datasets.NLIDataset.load_dataset

(dataset: Optional[Dataset] = None, 
dataset_name: Optional[str] = None,
classes: Optional[List[str]] = None, 
text_column: Optional[str] = "text",
label_column: Optional[str] = "label",
template: Optional[str] = "This example is {}.",
normalize_negatives: bool = False, 
positives: int = 1, 
negatives: int = -1,
multi_label: bool = False)

Parameters:

  • dataset (Optional[Dataset]): Dataset object (if dataset_namenot passed). (Defaults to None).

  • dataset_name (int): Dataset name to load from Hugging Face datasets (if dataset not passed). (Defaults to None).

  • classes (Optional[List[str]]): List of classes available in the dataset. (Defaults to None).

  • text_column (Optional[str]): Column name that contains the text. (Defaults to "text").

  • label_column (Optional[str]): Column name that contains the labels. (Defaults to "label").

  • template (Optional[str]): Template string that will be used to concatenate the label to it. (Defaults to "This example is {}.")

  • normalize_negatives (bool): Whether to normalize negative examples or not (Defaults to False).

  • positives (int): Positives label ID. (Defaults to 1).

  • negatives (int): Positives label ID. (Defaults to -1).

  • multi_label (bool): Whether each example has more than one label or not. (Defaults to False).

Using NLIDataset

from liqfit.datasets import NLIDataset
from datasets import load_dataset

nli_dataset = load_dataset("your/nli_dataset")

dataset = NLIDataset(
  hypothesis = nli_dataset['hypothesis'],
  premises = nli_dataset['premises'],
  labels = nli_dataset['labels'] # labels expected to be encoded.
)
                     
# OR

dataset = NLIDataset.load_dataset(nli_dataset, classes=["happiness", "sadness", ...])

Last updated