Indobert-QA
Property | Value |
---|---|
Model Size | 420MB |
Training Dataset | Translated SQuAD v2.0 (130k train, 12.3k eval) |
Performance Metrics | EM: 51.61, F1: 69.09 |
Model Hub | Hugging Face |
What is Indobert-QA?
Indobert-QA is a Question-Answering model specifically designed for the Indonesian language. It's based on IndoBERT, which was trained on 220M words from Indonesian Wikipedia, news articles, and web corpus. The model has been fine-tuned on a translated version of the SQuAD v2.0 dataset to perform question-answering tasks in Indonesian.
Implementation Details
The model was trained on a Tesla T4 GPU with 12GB RAM. The base IndoBERT model underwent 2.4M training steps (180 epochs) with a final perplexity of 3.97. The underlying IndoBERT was trained on diverse Indonesian text sources including Wikipedia (74M words), news articles (55M words), and web corpus (90M words).
- Trained on translated SQuAD 2.0 dataset with 130k training samples
- Capable of handling unanswerable questions
- Simple integration using Hugging Face transformers library
Core Capabilities
- Indonesian language question-answering
- Ability to determine when no answer is supported by the context
- High accuracy with EM score of 51.61 and F1 score of 69.09
- Handles both answerable and unanswerable questions
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Indonesian language Q&A, built on IndoBERT and fine-tuned on translated SQuAD v2.0. It's one of the few models specifically designed for Indonesian question-answering tasks.
Q: What are the recommended use cases?
The model is ideal for Indonesian text comprehension tasks, educational applications, and automated Q&A systems. It's particularly useful for applications requiring understanding of Indonesian text and extracting specific information based on questions.