BENT-PubMedBERT-NER-Gene
Property | Value |
---|---|
License | Apache 2.0 |
Architecture | PubMedBERT Fine-tuned |
Task | Token Classification (NER) |
Language | English |
What is BENT-PubMedBERT-NER-Gene?
BENT-PubMedBERT-NER-Gene is a specialized Named Entity Recognition model designed for identifying gene and protein entities in biomedical text. Built upon Microsoft's PubMedBERT architecture, this model has been extensively fine-tuned on a comprehensive collection of 19 biomedical datasets, making it particularly robust for gene/protein entity detection.
Implementation Details
The model is implemented using PyTorch and the Transformers library, leveraging the powerful PubMedBERT base architecture. It has been fine-tuned on various high-quality biomedical corpora, including miRNA-Test-Corpus, CellFinder, CRAFT, and multiple BioNLP shared task datasets.
- Based on PubMedBERT's uncased abstract/fulltext model
- Supports token classification for multiple gene/protein entity types
- Trained on diverse annotation schemas (Gene, Protein, Protein_Complex, Enzyme)
- Optimized for biomedical text analysis
Core Capabilities
- Recognition of gene and protein mentions in scientific text
- Identification of protein complexes and families
- Detection of DNA and RNA entities
- Support for enzyme and gene product annotation
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its comprehensive training on 19 different biomedical datasets, specifically focused on gene and protein entity recognition. Its foundation on PubMedBERT ensures domain-specific knowledge of biomedical terminology.
Q: What are the recommended use cases?
The model is ideal for biomedical text mining, automated literature review, gene/protein mention detection in scientific papers, and supporting biological database curation tasks.