bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract

Maintained By
OpenAlex

bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract

PropertyValue
AuthorOpenAlex
Base Modelbert-base-multilingual-cased
Training FrameworkTransformers 4.35.2, TensorFlow 2.13.0
Model URLHugging Face

What is bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract?

This is a specialized model fine-tuned on CWTS labeled dataset for academic topic classification. It's designed to analyze research paper titles and abstracts to assign relevant topics from a predefined set of categories. The model demonstrates increasing accuracy through training, reaching 48.46% accuracy after 8 epochs.

Implementation Details

The model implements a fine-tuned version of BERT multilingual cased, optimized using Adam optimizer with a learning rate of 6e-05 and warmup steps of 500. It processes input in a structured format, requiring specific tags for titles and abstracts.

  • Supports both title-only and abstract-only classification
  • Returns confidence scores for top 10 topic predictions
  • Uses specialized input format with <TITLE> and <ABSTRACT> tags
  • Implements truncation at 512 tokens

Core Capabilities

  • Multilingual topic classification for academic papers
  • Confidence scoring for topic predictions
  • Flexible input handling for titles and abstracts
  • Integration with larger classification systems

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in academic topic classification using a multilingual approach, making it valuable for international research classification. It's part of OpenAlex's larger classification system but can work independently for quick topic generation.

Q: What are the recommended use cases?

The model is ideal for rapid topic classification of research papers, preliminary content categorization, and integration into larger academic content management systems. It's particularly useful when dealing with multilingual academic content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.