ZeroShotClassificationPipeline
The pipeline allows easily run zero-shot text classification with fine-tuned cross-encoders.
class ZeroShotClassificationPipeline
Parameters:
model (AutoModelForSequenceClassification | CrossFitModel | torch.nn.Module): the argument specifies a fine-tuned model to be used in the processing pipeline.
tokenizer (AutoTokenizer): the tokenizer is responsible for breaking down input text into individual tokens, which are the basic units of language.
hypothesis_template (str, default='{}'): this optional argument allows to specify a template for generating hypotheses. The template is a string with a placeholder(s) that can be filled in during the inference process. The default value is an empty string, indicating that no specific template is required. Users can customize this template based on the desired output format.
hypothesis_first (bool, default = False): this argument specifies whether to put hypothesis before premise. It can be beneficial for models with a block attention mechanism when each token interacts with tokens in the range of some window and with
N
first tokens.encoder_decoder (bool, default=True): this boolean flag determines whether the model operates as an encoder-decoder architecture. When set to True, the model is configured as an encoder-decoder; in this case, a text is processed by the encoder, and the labels are processed with the decoder.
Using ZeroShotClassificationPipeline:
The pipeline supports classical cross-encoder models with 3 output neurons, corresponding entail, contradiction, neutral classes.
Also, the pipeline supports binary reranking models for both single-label and multi-label scenarios:
Encoder-decoder models are more flexible because they independently process text with an encoder and then, with smaller decoders, calculate the probabilities of each class. Moreover, they demonstrate better distinguishing between text and labels because it's processed with different parts of a model.
Last updated