DeciLM-6b-instruct
Property | Value |
---|---|
Parameter Count | 5.72B |
Model Type | Instruction-tuned Language Model |
License | Llama 2 Community License |
Training Data | SlimPajama-627B and OpenOrca |
Language | English |
What is DeciLM-6b-instruct?
DeciLM-6b-instruct is an advanced language model developed by Deci AI, specifically designed for short-form instruction following. It's built upon the base DeciLM 6B model and fine-tuned using LoRA on the OpenOrca dataset. The model implements an optimized transformer decoder architecture with variable Grouped-Query Attention, achieving impressive performance across multiple benchmarks.
Implementation Details
The model utilizes BF16 tensor types and demonstrates remarkable inference speed, achieving 652.49 tokens/sec on an A10 GPU using PyTorch, and up to 2,029.6 tokens/sec using Infery LLM. The architecture incorporates advanced proprietary methodologies that enable faster training and inference compared to similar-sized models.
- Optimized transformer decoder architecture
- Variable Grouped-Query Attention implementation
- BF16 precision for efficient computation
- Comprehensive benchmark performance across 9 different tasks
Core Capabilities
- Strong performance on BoolQ (77.34%) and PIQA (77.52%)
- Effective reasoning capabilities demonstrated by HellaSwag score (74.57%)
- Reliable performance on LAMBDA OpenAI benchmark (70.1%)
- Suitable for commercial and research applications
Frequently Asked Questions
Q: What makes this model unique?
DeciLM-6b-instruct stands out due to its optimized architecture with variable Grouped-Query Attention, making it significantly faster than comparable models while maintaining strong performance across various benchmarks. Its efficient design allows for exceptional inference speeds, particularly when using specialized inference tools.
Q: What are the recommended use cases?
The model is particularly well-suited for short-form instruction following tasks, commercial applications, and research use in English. It can be fine-tuned for other languages and shows strong performance in question-answering, reasoning, and general language understanding tasks.