mLUKE-large-lite
Property | Value |
---|---|
Parameter Count | 561M |
Base Architecture | XLM-RoBERTa (large) |
Languages | 24 languages |
Model URL | https://huggingface.co/studio-ousia/mluke-large-lite |
Author | studio-ousia |
What is mluke-large-lite?
mLUKE-large-lite is a lightweight multilingual language model that extends the original LUKE (Language Understanding with Knowledge-based Embeddings) architecture. It's built upon XLM-RoBERTa large and trained on Wikipedia data from December 2020 covering 24 languages. Unlike its parent model studio-ousia/mluke-large, this lite version excludes Wikipedia entity embeddings while retaining special entities like [MASK].
Implementation Details
The model features 12 hidden layers with a hidden size of 768, totaling 561M parameters. It's initialized with XLM-RoBERTa(large) weights and includes entity-aware attention capabilities that can be enabled as needed. The architecture maintains the core LUKE functionality while optimizing for efficiency by removing the extensive Wikipedia entity embeddings.
- 12 hidden layers with 768 hidden size
- Built on XLM-RoBERTa large architecture
- Optional entity-aware attention mechanism
- Supports special entity tokens like [MASK]
Core Capabilities
- Multilingual understanding across 24 languages
- Efficient processing without full entity embedding overhead
- Compatible with standard transformer-based tasks
- Maintains entity-aware attention functionality when needed
Frequently Asked Questions
Q: What makes this model unique?
mLUKE-large-lite offers a more efficient alternative to the full mLUKE model by removing Wikipedia entity embeddings while maintaining core multilingual capabilities. This makes it more lightweight and easier to deploy while still leveraging the power of entity-aware attention when needed.
Q: What are the recommended use cases?
The model is particularly well-suited for multilingual NLP tasks where full entity knowledge isn't critical, such as text classification, sequence labeling, and general language understanding tasks across multiple languages. It's ideal for applications requiring efficient multilingual processing without the overhead of full entity embeddings.