LINE DistilBERT Japanese
Property | Value |
---|---|
Parameter Count | 68M |
Architecture | DistilBERT (6 layers, 768 hidden, 12 heads) |
Training Data | 131GB Japanese web text |
License | Apache 2.0 |
Vocabulary Size | 32,768 tokens |
What is line-distilbert-base-japanese?
LINE DistilBERT Japanese is a compressed BERT model specifically designed for Japanese language processing, developed by LINE Corporation. This model represents a significant achievement in creating an efficient yet powerful Japanese language model, trained on an extensive 131GB dataset of Japanese web text.
Implementation Details
The model implements a distilled architecture of BERT, featuring 6 layers, 768-dimensional hidden states, and 12 attention heads, resulting in 68M parameters. It employs a sophisticated tokenization pipeline that combines MeCab with the Unidic dictionary for initial tokenization, followed by SentencePiece subword tokenization.
- Optimized architecture with 6 transformer layers
- 768-dimensional hidden states
- 12 attention heads for efficient processing
- 32,768 vocabulary size using combined MeCab and SentencePiece tokenization
Core Capabilities
- Excellent performance on JGLUE benchmarks (95.6% accuracy on Marc_ja)
- Strong results in Japanese Natural Language Inference (88.9% accuracy)
- Superior performance in question answering (87.3% EM, 93.3% F1 on JSQuAD)
- Efficient semantic similarity scoring (89.2% Pearson correlation on JSTS)
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its optimal balance between performance and efficiency, outperforming other Japanese DistilBERT variants while maintaining a compact architecture. Its superior performance on JGLUE benchmarks makes it particularly valuable for Japanese NLP tasks.
Q: What are the recommended use cases?
The model is well-suited for various Japanese language processing tasks, including text classification, question answering, and semantic similarity analysis. It's particularly effective for applications requiring efficient inference while maintaining high accuracy.