BENT-PubMedBERT-NER-Gene

Property	Value
License	Apache 2.0
Architecture	PubMedBERT Fine-tuned
Task	Token Classification (NER)
Language	English

What is BENT-PubMedBERT-NER-Gene?

BENT-PubMedBERT-NER-Gene is a specialized Named Entity Recognition model designed for identifying gene and protein entities in biomedical text. Built upon Microsoft's PubMedBERT architecture, this model has been extensively fine-tuned on a comprehensive collection of 19 biomedical datasets, making it particularly robust for gene/protein entity detection.

Implementation Details

The model is implemented using PyTorch and the Transformers library, leveraging the powerful PubMedBERT base architecture. It has been fine-tuned on various high-quality biomedical corpora, including miRNA-Test-Corpus, CellFinder, CRAFT, and multiple BioNLP shared task datasets.

Based on PubMedBERT's uncased abstract/fulltext model
Supports token classification for multiple gene/protein entity types
Trained on diverse annotation schemas (Gene, Protein, Protein_Complex, Enzyme)
Optimized for biomedical text analysis

Core Capabilities

Recognition of gene and protein mentions in scientific text
Identification of protein complexes and families
Detection of DNA and RNA entities
Support for enzyme and gene product annotation

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its comprehensive training on 19 different biomedical datasets, specifically focused on gene and protein entity recognition. Its foundation on PubMedBERT ensures domain-specific knowledge of biomedical terminology.

Q: What are the recommended use cases?

The model is ideal for biomedical text mining, automated literature review, gene/protein mention detection in scientific papers, and supporting biological database curation tasks.