Creating custom collator
Reference API for Collator base class.
class liqfit.collators.Collator
(tokenizer: Union[Callable, AutoTokenizer), max_length: int, padding: Union[bool, str], truncation: bool)Parameters:
tokenizer (AutoTokenizer, Callable): The tokenizer used to process the input data from texts to input IDs.
max_length (int): Max length that will be used while tokenizing the input sequences.
padding: (Union[bool, str]): Option to specify whether to use pad sequences during tokenization or not.
truncation (bool): Option to specify whether to use truncate sequences during tokenization or not.
Using custom Collator
The Collator base class here just groups your batch into one dictionary instead of a list of dictionaries.
from liqfit.collators import Collator
class MyCollator(Collator):
def __init__(self, tokenizer, max_length, padding, truncation)
super().__init__(tokenizer, max_length, padding, truncation)
def collate(self, batch):
# your collate implementation.Last updated