indonesian-roberta-base-posp-tagger

w11wo

Indonesian RoBERTa-based POS tagger achieving 96.25% accuracy on IndoNLU dataset. 124M params, MIT licensed, optimized for Indonesian text.

Property	Value
Parameter Count	124M
License	MIT
Framework	PyTorch, Transformers
Base Model	flax-community/indonesian-roberta-base

What is indonesian-roberta-base-posp-tagger?

This is a specialized Part-of-Speech (POS) tagger built on RoBERTa architecture, specifically fine-tuned for Indonesian language processing. The model demonstrates exceptional performance with 96.25% accuracy across precision, recall, and F1 metrics on the IndoNLU dataset.

Implementation Details

The model is implemented using the Transformers library and PyTorch framework, fine-tuned from the indonesian-roberta-base model. Training was conducted over 10 epochs using the Adam optimizer with a learning rate of 2e-05 and linear scheduler.

Batch size: 16 for both training and evaluation
Training optimization: Adam (β1=0.9, β2=0.999, ε=1e-08)
Final validation loss: 0.1668
Best performance achieved at epoch 10

Core Capabilities

High-accuracy POS tagging for Indonesian text
Token classification with 96.25% precision and recall
Optimized for Indonesian language understanding
Suitable for integration into larger NLP pipelines

Frequently Asked Questions

Q: What makes this model unique?

This model combines the power of RoBERTa architecture with specific optimizations for Indonesian language, achieving state-of-the-art performance in POS tagging tasks with consistent 96.25% accuracy across all metrics.

Q: What are the recommended use cases?

The model is ideal for Indonesian text analysis tasks requiring part-of-speech tagging, including syntactic parsing, grammatical analysis, and text preprocessing for downstream NLP tasks.