LINE DistilBERT Japanese
Property | Value |
---|---|
Parameters | 66M |
License | Apache-2.0 |
Architecture | DistilBERT (6 layers, 768 hidden) |
Training Data | 131GB Japanese web text |
What is line-distilbert-base-japanese?
LINE DistilBERT Japanese is a compressed BERT model specifically designed for Japanese language processing, developed by LINE Corporation. It's a lightweight yet powerful model that maintains impressive performance while reducing computational requirements through knowledge distillation from a larger teacher model.
Implementation Details
The model employs a sophisticated tokenization pipeline that combines MeCab (with Unidic dictionary) for initial tokenization and SentencePiece for subword tokenization. The vocabulary size is 32,768 tokens, and the architecture consists of 6 transformer layers with 12 attention heads.
- Efficient architecture with 66M parameters
- 768-dimensional hidden states
- Hybrid tokenization system (MeCab + SentencePiece)
- Pre-trained on extensive Japanese web text corpus
Core Capabilities
- Strong performance on JGLUE benchmarks (95.6% accuracy on Marc_ja)
- Excels in Japanese text understanding tasks
- Efficient masked language modeling
- Supports various downstream NLP tasks
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its optimal balance between efficiency and performance, achieving superior results on JGLUE benchmarks compared to other Japanese DistilBERT variants while maintaining a compact architecture. It outperforms competitors like Laboro-DistilBERT and BandaiNamco-DistilBERT across multiple metrics.
Q: What are the recommended use cases?
The model is particularly well-suited for Japanese language processing tasks including text classification, sentiment analysis, and question answering. Its efficient architecture makes it ideal for production environments where computational resources are constrained but high performance is required.