bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract
Property | Value |
---|---|
Author | OpenAlex |
Base Model | bert-base-multilingual-cased |
Training Framework | Transformers 4.35.2, TensorFlow 2.13.0 |
Model URL | Hugging Face |
What is bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract?
This is a specialized model fine-tuned on CWTS labeled dataset for academic topic classification. It's designed to analyze research paper titles and abstracts to assign relevant topics from a predefined set of categories. The model demonstrates increasing accuracy through training, reaching 48.46% accuracy after 8 epochs.
Implementation Details
The model implements a fine-tuned version of BERT multilingual cased, optimized using Adam optimizer with a learning rate of 6e-05 and warmup steps of 500. It processes input in a structured format, requiring specific tags for titles and abstracts.
- Supports both title-only and abstract-only classification
- Returns confidence scores for top 10 topic predictions
- Uses specialized input format with <TITLE> and <ABSTRACT> tags
- Implements truncation at 512 tokens
Core Capabilities
- Multilingual topic classification for academic papers
- Confidence scoring for topic predictions
- Flexible input handling for titles and abstracts
- Integration with larger classification systems
Frequently Asked Questions
Q: What makes this model unique?
This model specializes in academic topic classification using a multilingual approach, making it valuable for international research classification. It's part of OpenAlex's larger classification system but can work independently for quick topic generation.
Q: What are the recommended use cases?
The model is ideal for rapid topic classification of research papers, preliminary content categorization, and integration into larger academic content management systems. It's particularly useful when dealing with multilingual academic content.