xLSTM-7b
Property | Value |
---|---|
Model Size | 7 Billion parameters |
Training Data | 2.3T tokens (DCLM dataset) |
Framework | xlstm-jax |
License | NXAI Community License |
Model URL | https://huggingface.co/NX-AI/xLSTM-7b |
What is xLSTM-7b?
xLSTM-7b is an advanced language model developed by NX-AI that implements a novel LSTM-based architecture. Pre-trained on approximately 2.3 trillion tokens using the DCLM dataset and selected high-quality data, this model represents a significant advancement in LSTM-based language modeling. It offers competitive performance across various benchmarks while maintaining implementation flexibility.
Implementation Details
The model is implemented using the xlstm-jax framework and can be easily integrated using the Hugging Face Transformers library. It features both high-performance Triton kernels and native PyTorch implementations for maximum deployment flexibility.
- Supports multiple kernel implementations (Triton and native PyTorch)
- Includes optimization support for torch.cuda.graph and torch.compile
- Compatible with NVIDIA hardware, with demonstrated performance on H100 GPUs
Core Capabilities
- Strong performance on benchmark tasks (BBH: 0.381, MMLU-Pro: 0.242)
- Impressive results on reasoning tasks (Winogrande: 0.742, PiQA: 0.817)
- Flexible deployment options with configurable kernel implementations
- Efficient text generation capabilities
Frequently Asked Questions
Q: What makes this model unique?
xLSTM-7b stands out for its LSTM-based architecture, which offers an alternative to transformer-based models while maintaining competitive performance. It provides flexible implementation options and demonstrates strong results across various benchmarks, particularly in reasoning tasks.
Q: What are the recommended use cases?
The model is well-suited for general language modeling tasks, with particularly strong performance in areas requiring reasoning capabilities. Its flexible implementation options make it suitable for both research and production environments, especially where deployment configuration flexibility is needed.