LUKE Japanese Large

Property	Value
Author	Studio Ousia
Paper	LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention (EMNLP 2020)
Model Hub	Hugging Face

What is luke-japanese-large?

LUKE-japanese-large is a sophisticated language model that represents a significant advancement in Japanese natural language processing. It's an adaptation of the LUKE (Language Understanding with Knowledge-based Embeddings) architecture specifically designed for Japanese language tasks. The model uniquely treats words and entities as independent tokens, enabling enhanced contextual understanding of both textual and entity-based information.

Implementation Details

The model implements a knowledge-enhanced architecture that incorporates Wikipedia entity embeddings alongside traditional word embeddings. This dual-representation approach allows for more nuanced understanding of text with entity references. The model demonstrates exceptional performance across various Japanese language understanding tasks, as evidenced by its JGLUE benchmark results.

Incorporates Wikipedia entity embeddings for enhanced knowledge representation
Utilizes entity-aware self-attention mechanisms
Achieves state-of-the-art results on multiple JGLUE tasks

Core Capabilities

MARC-ja accuracy: 96.5%
JSTS performance: 0.932/0.902 (Pearson/Spearman)
JNLI accuracy: 92.7%
JCommonsenseQA accuracy: 89.3%
Outperforms major baseline models including Tohoku BERT large and XLM RoBERTa large

Frequently Asked Questions

Q: What makes this model unique?

This model's distinctive feature is its ability to process both words and entities as separate tokens while maintaining contextual relationships between them. It incorporates Wikipedia entity embeddings, making it particularly powerful for tasks requiring deep semantic understanding of Japanese text.

Q: What are the recommended use cases?

The model is ideal for complex Japanese NLP tasks such as text classification, semantic similarity analysis, natural language inference, and question answering. For tasks that don't require entity processing, it's recommended to use the lite version of the model for better efficiency.