xLSTM-7b

Property	Value
Model Size	7 Billion parameters
Training Data	2.3T tokens (DCLM dataset)
Framework	xlstm-jax
License	NXAI Community License
Model URL	https://huggingface.co/NX-AI/xLSTM-7b

What is xLSTM-7b?

xLSTM-7b is an advanced language model developed by NX-AI that implements a novel LSTM-based architecture. Pre-trained on approximately 2.3 trillion tokens using the DCLM dataset and selected high-quality data, this model represents a significant advancement in LSTM-based language modeling. It offers competitive performance across various benchmarks while maintaining implementation flexibility.

Implementation Details

The model is implemented using the xlstm-jax framework and can be easily integrated using the Hugging Face Transformers library. It features both high-performance Triton kernels and native PyTorch implementations for maximum deployment flexibility.

Supports multiple kernel implementations (Triton and native PyTorch)
Includes optimization support for torch.cuda.graph and torch.compile
Compatible with NVIDIA hardware, with demonstrated performance on H100 GPUs

Core Capabilities

Strong performance on benchmark tasks (BBH: 0.381, MMLU-Pro: 0.242)
Impressive results on reasoning tasks (Winogrande: 0.742, PiQA: 0.817)
Flexible deployment options with configurable kernel implementations
Efficient text generation capabilities

Frequently Asked Questions

Q: What makes this model unique?

xLSTM-7b stands out for its LSTM-based architecture, which offers an alternative to transformer-based models while maintaining competitive performance. It provides flexible implementation options and demonstrates strong results across various benchmarks, particularly in reasoning tasks.

Q: What are the recommended use cases?

The model is well-suited for general language modeling tasks, with particularly strong performance in areas requiring reasoning capabilities. Its flexible implementation options make it suitable for both research and production environments, especially where deployment configuration flexibility is needed.

xLSTM-7b

xLSTM-7b

What is xLSTM-7b?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models