LINE DistilBERT Japanese

Property	Value
Parameters	66M
License	Apache-2.0
Architecture	DistilBERT (6 layers, 768 hidden)
Training Data	131GB Japanese web text

What is line-distilbert-base-japanese?

LINE DistilBERT Japanese is a compressed BERT model specifically designed for Japanese language processing, developed by LINE Corporation. It's a lightweight yet powerful model that maintains impressive performance while reducing computational requirements through knowledge distillation from a larger teacher model.

Implementation Details

The model employs a sophisticated tokenization pipeline that combines MeCab (with Unidic dictionary) for initial tokenization and SentencePiece for subword tokenization. The vocabulary size is 32,768 tokens, and the architecture consists of 6 transformer layers with 12 attention heads.

Efficient architecture with 66M parameters
768-dimensional hidden states
Hybrid tokenization system (MeCab + SentencePiece)
Pre-trained on extensive Japanese web text corpus

Core Capabilities

Strong performance on JGLUE benchmarks (95.6% accuracy on Marc_ja)
Excels in Japanese text understanding tasks
Efficient masked language modeling
Supports various downstream NLP tasks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimal balance between efficiency and performance, achieving superior results on JGLUE benchmarks compared to other Japanese DistilBERT variants while maintaining a compact architecture. It outperforms competitors like Laboro-DistilBERT and BandaiNamco-DistilBERT across multiple metrics.

Q: What are the recommended use cases?

The model is particularly well-suited for Japanese language processing tasks including text classification, sentiment analysis, and question answering. Its efficient architecture makes it ideal for production environments where computational resources are constrained but high performance is required.

line-distilbert-base-japanese