Creating custom collator

Reference API for Collator base class.

class liqfit.collators.Collator

(tokenizer: Union[Callable, AutoTokenizer), max_length: int, padding: Union[bool, str], truncation: bool)

Parameters:

  • tokenizer (AutoTokenizer, Callable): The tokenizer used to process the input data from texts to input IDs.

  • max_length (int): Max length that will be used while tokenizing the input sequences.

  • padding: (Union[bool, str]): Option to specify whether to use pad sequences during tokenization or not.

  • truncation (bool): Option to specify whether to use truncate sequences during tokenization or not.

Using custom Collator

The Collator base class here just groups your batch into one dictionary instead of a list of dictionaries.

from liqfit.collators import Collator

class MyCollator(Collator):
    def __init__(self, tokenizer, max_length, padding, truncation)
        super().__init__(tokenizer, max_length, padding, truncation)
    
    def collate(self, batch):
        # your collate implementation.

Last updated