Knowledgator Docs
GitHubDiscord
  • 🛎️Welcome
  • ⚙️Models
    • 🧮Comprehend-it
      • Comprehend_it-base
      • Comprehend_it-multilingual-t5-base
    • 🦎UTC
  • 👷Frameworks
    • 💧LiqFit
      • Quick Start
      • Benchmarks
      • API Reference
        • Collators
          • NLICollator
          • Creating custom collator
        • Datasets
          • NLIDataset
        • Losses
          • Focal Loss
          • Binary Cross Entropy
          • Cross Entropy Loss
        • Modeling
          • LiqFitBackbone
          • LiqFitModel
        • Downstream Heads
          • LiqFitHead
          • LabelClassificationHead
          • ClassClassificationHead
          • ClassificationHead
        • Pooling
          • GlobalMaxPooling1D
          • GlobalAbsAvgPooling1D
          • GlobalAbsMaxPooling1D
          • GlobalRMSPooling1D
          • GlobalSumPooling1D
          • GlobalAvgPooling1D
          • FirstTokenPooling1D
        • Models
          • Deberta
          • T5
        • Pipelines
          • ZeroShotClassificationPipeline
  • 📚Datasets
    • Biotech news dataset
  • 👩‍🔧Support
  • API Reference
    • Comprehend-it API
    • Entity extraction
      • /fast
      • /deterministic
      • /advanced
    • Token searcher
    • Web2Meaning
    • Web2Meaning2
    • Relation extraction
    • Text2Table
      • /web2text
      • /text_preprocessing
      • /text2table
      • /merge_tables
Powered by GitBook
On this page
  • Classes
  • Benchmark
  1. Datasets

Biotech news dataset

This dataset is specifically curated to address the limitations of existing benchmarks by incorporating rich and complex content derived from the biotech news domain. It encompasses diverse biotech news articles consisting of various events, offering a more nuanced perspective on information extraction challenges.

A distinctive feature of this dataset is its emphasis on not only identifying the overarching theme but also extracting information about the target companies associated with the news. This dual-layered approach enhances the dataset's utility for applications that require a deeper understanding of the relationships between events, companies, and the biotech industry as a whole.

Classes

The dataset consists of 31 classes, including None values.

Category
Description

Alliance & Partnership

Forming an alliance or partnership.

Article Publication

Publishing an article.

Clinical Trial Sponsorship

Sponsoring or participating in a clinical trial.

Closing

Shutting down a facility/office/division or ceasing an initiative.

Company Description

Describing or profiling the company.

Department Establishment

Establishing a new department or division.

Event Organisation

Organizing or participating in an event.

Event Organization

Organizing or participating in an event like a conference, exhibition, etc.

Executive Appointment

Appointing a new executive.

Executive Statement

A statement or quote from an executive of a company.

Expanding Geography

Expanding into new geographical areas.

Expanding Industry

Expanding into new industries or markets.

Foundation

Establishing a new charitable foundation.

Funding Round

Raising a new round of funding.

Hiring

Announcing new hires or appointments at the company.

Investment in Public Company

Making an investment in a public company.

IPO Exit

Having an initial public offering or acquisition exit.

M&A

Mergers, acquisitions, or divestitures.

New Initiatives & Programs

Announcing new initiatives or programs.

New Initiatives or Programs

Announcing new initiatives, programs, or campaigns.

None

No label.

Participation in an Event

Participating in an industry event, conference, etc.

Partnerships & Alliances

Forming partnerships or strategic alliances with other companies.

Patent Publication

Publication of a new patent filing.

Product Launching & Presentation

Launching or unveiling a new product.

Product Updates

Announcing updates or new versions of existing products.

Regulatory Approval

Getting approval from regulatory bodies for products, services, trials, etc.

Service & Product Providing

Launching or expanding products or services.

Subsidiary Establishment

Establishing a new subsidiary company.

Support & Philanthropy

Philanthropic activities or donations.

Other

Other events that don't fit into defined categories.

Benchmark

We trained various models with binary-cross entropy loss and evaluated them on the test set.

Model
Accuracy
F1
Precision
Recall

DeBERTa-small

96.58

67.69

74.18

62.19

DeBERTa-base

96.60

67.55

74.81

61.58

DeBERTa-large

96.99

74.07

73.46

74.69

SciBERT-uncased

96.57

68.07

73.07

63.71

Flan-T5-base

96.85

71.10

75.71

67.07

Recommended reading:

PreviousDatasetsNextSupport

Last updated 1 year ago

Check the general overview of the dataset on Medium -

Try to train your own model on the datset -

📚
Finally, a decent multi-label classification benchmark is created: a prominent zero-shot dataset.
Multi-Label Classification Model From Scratch: Step-by-Step Tutorial