hindi-bert
Property | Value |
---|---|
Parameter Count | 14.7M |
Model Type | ELECTRA |
Framework | PyTorch, TensorFlow |
Downloads | 3,233 |
Tensor Type | F32 |
What is hindi-bert?
hindi-bert is an ELECTRA-based language model specifically trained for Hindi natural language processing tasks. Developed by monsoon-nlp, this model represents one of the first attempts at creating a dedicated Hindi language model using Google Research's ELECTRA architecture. The model was trained on a comprehensive dataset combining Hindi CommonCrawl (deduped by OSCAR) and the latest Hindi Wikipedia dumps.
Implementation Details
The model utilizes the ELECTRA architecture with 14.7M parameters, implemented using both PyTorch and TensorFlow frameworks. It features a custom vocabulary created using HuggingFace Tokenizers and supports both discriminator and generator components typical of ELECTRA models. The training process involved structured data organization with pretrain TFRecords and supports both GPU and TPU setups.
- Custom vocabulary implementation with adjustable size
- Supports model conversion between PyTorch and TensorFlow
- Flexible training configuration through configure_pretraining.py
- Compatible with SimpleTransformers and ktrain frameworks
Core Capabilities
- News Classification: Comparable performance to Multilingual BERT on BBC Hindi news classification
- Sentiment Analysis: Effective for Hindi movie reviews analysis
- Question-Answering: Supports MLQA dataset tasks
- Feature Extraction: Suitable for various Hindi NLP tasks
Frequently Asked Questions
Q: What makes this model unique?
This model is one of the first ELECTRA-based models specifically trained for Hindi language processing, offering a lighter alternative to larger models while maintaining competitive performance on various NLP tasks.
Q: What are the recommended use cases?
The model is particularly effective for Hindi text classification, sentiment analysis, and question-answering tasks. For more comprehensive tasks, the author recommends considering Google's MuRIL model or sberbank-ai/mGPT for causal language modeling.