mLUKE-large-lite

Property	Value
Parameter Count	561M
Base Architecture	XLM-RoBERTa (large)
Languages	24 languages
Model URL	https://huggingface.co/studio-ousia/mluke-large-lite
Author	studio-ousia

What is mluke-large-lite?

mLUKE-large-lite is a lightweight multilingual language model that extends the original LUKE (Language Understanding with Knowledge-based Embeddings) architecture. It's built upon XLM-RoBERTa large and trained on Wikipedia data from December 2020 covering 24 languages. Unlike its parent model studio-ousia/mluke-large, this lite version excludes Wikipedia entity embeddings while retaining special entities like [MASK].

Implementation Details

The model features 12 hidden layers with a hidden size of 768, totaling 561M parameters. It's initialized with XLM-RoBERTa(large) weights and includes entity-aware attention capabilities that can be enabled as needed. The architecture maintains the core LUKE functionality while optimizing for efficiency by removing the extensive Wikipedia entity embeddings.

12 hidden layers with 768 hidden size
Built on XLM-RoBERTa large architecture
Optional entity-aware attention mechanism
Supports special entity tokens like [MASK]

Core Capabilities

Multilingual understanding across 24 languages
Efficient processing without full entity embedding overhead
Compatible with standard transformer-based tasks
Maintains entity-aware attention functionality when needed

Frequently Asked Questions

Q: What makes this model unique?

mLUKE-large-lite offers a more efficient alternative to the full mLUKE model by removing Wikipedia entity embeddings while maintaining core multilingual capabilities. This makes it more lightweight and easier to deploy while still leveraging the power of entity-aware attention when needed.

Q: What are the recommended use cases?

The model is particularly well-suited for multilingual NLP tasks where full entity knowledge isn't critical, such as text classification, sequence labeling, and general language understanding tasks across multiple languages. It's ideal for applications requiring efficient multilingual processing without the overhead of full entity embeddings.

mluke-large-lite