Biotech news dataset
This dataset is specifically curated to address the limitations of existing benchmarks by incorporating rich and complex content derived from the biotech news domain. It encompasses diverse biotech news articles consisting of various events, offering a more nuanced perspective on information extraction challenges.
A distinctive feature of this dataset is its emphasis on not only identifying the overarching theme but also extracting information about the target companies associated with the news. This dual-layered approach enhances the dataset's utility for applications that require a deeper understanding of the relationships between events, companies, and the biotech industry as a whole.
Classes
The dataset consists of 31 classes, including None values.
Alliance & Partnership
Forming an alliance or partnership.
Article Publication
Publishing an article.
Clinical Trial Sponsorship
Sponsoring or participating in a clinical trial.
Closing
Shutting down a facility/office/division or ceasing an initiative.
Company Description
Describing or profiling the company.
Department Establishment
Establishing a new department or division.
Event Organisation
Organizing or participating in an event.
Event Organization
Organizing or participating in an event like a conference, exhibition, etc.
Executive Appointment
Appointing a new executive.
Executive Statement
A statement or quote from an executive of a company.
Expanding Geography
Expanding into new geographical areas.
Expanding Industry
Expanding into new industries or markets.
Foundation
Establishing a new charitable foundation.
Funding Round
Raising a new round of funding.
Hiring
Announcing new hires or appointments at the company.
Investment in Public Company
Making an investment in a public company.
IPO Exit
Having an initial public offering or acquisition exit.
M&A
Mergers, acquisitions, or divestitures.
New Initiatives & Programs
Announcing new initiatives or programs.
New Initiatives or Programs
Announcing new initiatives, programs, or campaigns.
None
No label.
Participation in an Event
Participating in an industry event, conference, etc.
Partnerships & Alliances
Forming partnerships or strategic alliances with other companies.
Patent Publication
Publication of a new patent filing.
Product Launching & Presentation
Launching or unveiling a new product.
Product Updates
Announcing updates or new versions of existing products.
Regulatory Approval
Getting approval from regulatory bodies for products, services, trials, etc.
Service & Product Providing
Launching or expanding products or services.
Subsidiary Establishment
Establishing a new subsidiary company.
Support & Philanthropy
Philanthropic activities or donations.
Other
Other events that don't fit into defined categories.
Benchmark
We trained various models with binary-cross entropy loss and evaluated them on the test set.
DeBERTa-small
96.58
67.69
74.18
62.19
DeBERTa-base
96.60
67.55
74.81
61.58
DeBERTa-large
96.99
74.07
73.46
74.69
SciBERT-uncased
96.57
68.07
73.07
63.71
Flan-T5-base
96.85
71.10
75.71
67.07
Recommended reading:
Check the general overview of the dataset on Medium - Finally, a decent multi-label classification benchmark is created: a prominent zero-shot dataset.
Try to train your own model on the datset - Multi-Label Classification Model From Scratch: Step-by-Step Tutorial
Last updated