bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract

Property	Value
Author	OpenAlex
Base Model	bert-base-multilingual-cased
Training Framework	Transformers 4.35.2, TensorFlow 2.13.0
Model URL	Hugging Face

What is bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract?

This is a specialized model fine-tuned on CWTS labeled dataset for academic topic classification. It's designed to analyze research paper titles and abstracts to assign relevant topics from a predefined set of categories. The model demonstrates increasing accuracy through training, reaching 48.46% accuracy after 8 epochs.

Implementation Details

The model implements a fine-tuned version of BERT multilingual cased, optimized using Adam optimizer with a learning rate of 6e-05 and warmup steps of 500. It processes input in a structured format, requiring specific tags for titles and abstracts.

Supports both title-only and abstract-only classification
Returns confidence scores for top 10 topic predictions
Uses specialized input format with <TITLE> and <ABSTRACT> tags
Implements truncation at 512 tokens

Core Capabilities

Multilingual topic classification for academic papers
Confidence scoring for topic predictions
Flexible input handling for titles and abstracts
Integration with larger classification systems

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in academic topic classification using a multilingual approach, making it valuable for international research classification. It's part of OpenAlex's larger classification system but can work independently for quick topic generation.

Q: What are the recommended use cases?

The model is ideal for rapid topic classification of research papers, preliminary content categorization, and integration into larger academic content management systems. It's particularly useful when dealing with multilingual academic content.