Biotech news dataset
This dataset is specifically curated to address the limitations of existing benchmarks by incorporating rich and complex content derived from the biotech news domain. It encompasses diverse biotech news articles consisting of various events, offering a more nuanced perspective on information extraction challenges.
A distinctive feature of this dataset is its emphasis on not only identifying the overarching theme but also extracting information about the target companies associated with the news. This dual-layered approach enhances the dataset's utility for applications that require a deeper understanding of the relationships between events, companies, and the biotech industry as a whole.
Classes
The dataset consists of 31 classes, including None values.
Benchmark
We trained various models with binary-cross entropy loss and evaluated them on the test set.
Recommended reading:
Check the general overview of the dataset on Medium - Finally, a decent multi-label classification benchmark is created: a prominent zero-shot dataset.
Try to train your own model on the datset - Multi-Label Classification Model From Scratch: Step-by-Step Tutorial
Last updated