t5-base-japanese

Property	Value
Parameters	222M
License	CC-BY SA 4.0
Training Data	Wikipedia, OSCAR, CC-100
Framework	PyTorch

What is t5-base-japanese?

t5-base-japanese is a specialized Text-to-Text Transfer Transformer (T5) model pre-trained specifically for Japanese language tasks. Developed by sonoisa, this model leverages approximately 100GB of Japanese text from diverse sources including Wikipedia, OSCAR corpus, and CC-100 dataset. The model demonstrates superior performance compared to multilingual alternatives, particularly in tasks like news classification.

Implementation Details

The model utilizes a SentencePiece tokenizer trained on the complete Japanese Wikipedia dataset. With 222M parameters, it's 25% smaller than Google's mT5-small while achieving better performance. The model requires fine-tuning for specific downstream tasks but provides strong baseline performance.

Pre-trained on 100GB of Japanese text
Achieves 97% accuracy on livedoor news classification
JSQuAD performance: EM=0.900, F1=0.945
Implements T5 architecture with Japanese-specific optimizations

Core Capabilities

Text classification with high accuracy (97% on news classification)
Question answering (JSQuAD benchmark)
Text generation and sequence-to-sequence tasks
Feature extraction for Japanese text

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized Japanese language capabilities and improved efficiency, offering better performance than multilingual alternatives with a smaller parameter count. It's particularly notable for achieving 6 percentage points higher accuracy than mT5 on news classification tasks.

Q: What are the recommended use cases?

The model is well-suited for Japanese text classification, question answering, and sequence-to-sequence tasks. However, it requires task-specific fine-tuning before deployment. Users should be aware of potential biases in the training data and ensure ethical usage.

t5-base-japanese

t5-base-japanese

What is t5-base-japanese?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models