BENT-PubMedBERT-NER-Gene

Maintained By
pruas

BENT-PubMedBERT-NER-Gene

PropertyValue
LicenseApache 2.0
ArchitecturePubMedBERT Fine-tuned
TaskToken Classification (NER)
LanguageEnglish

What is BENT-PubMedBERT-NER-Gene?

BENT-PubMedBERT-NER-Gene is a specialized Named Entity Recognition model designed for identifying gene and protein entities in biomedical text. Built upon Microsoft's PubMedBERT architecture, this model has been extensively fine-tuned on a comprehensive collection of 19 biomedical datasets, making it particularly robust for gene/protein entity detection.

Implementation Details

The model is implemented using PyTorch and the Transformers library, leveraging the powerful PubMedBERT base architecture. It has been fine-tuned on various high-quality biomedical corpora, including miRNA-Test-Corpus, CellFinder, CRAFT, and multiple BioNLP shared task datasets.

  • Based on PubMedBERT's uncased abstract/fulltext model
  • Supports token classification for multiple gene/protein entity types
  • Trained on diverse annotation schemas (Gene, Protein, Protein_Complex, Enzyme)
  • Optimized for biomedical text analysis

Core Capabilities

  • Recognition of gene and protein mentions in scientific text
  • Identification of protein complexes and families
  • Detection of DNA and RNA entities
  • Support for enzyme and gene product annotation

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its comprehensive training on 19 different biomedical datasets, specifically focused on gene and protein entity recognition. Its foundation on PubMedBERT ensures domain-specific knowledge of biomedical terminology.

Q: What are the recommended use cases?

The model is ideal for biomedical text mining, automated literature review, gene/protein mention detection in scientific papers, and supporting biological database curation tasks.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.