DeciLM-6b

Property	Value
Parameter Count	5.7 Billion
Model Type	Decoder-only Language Model
Architecture	Transformer with Variable GQA
Context Length	4096 tokens
License	Llama 2 Community License
Training Data	SlimPajama dataset

What is DeciLM-6b?

DeciLM-6b is a groundbreaking language model developed by Deci AI that combines high performance with remarkable efficiency. This 5.7B parameter model leverages an innovative variable Grouped-Query Attention mechanism, achieved through Deci's proprietary Neural Architecture Search technology (AutoNAC). The model demonstrates impressive benchmark results across multiple tasks while maintaining significantly higher throughput compared to similar-sized models.

Implementation Details

The model architecture features 32 layers with 32 attention heads and a hidden size of 4096. It implements Dynamic NTK Scaling Rotary Position Embeddings and variable GQA, optimized per layer for maximum efficiency. Performance benchmarks show throughput of up to 2,029.6 tokens/sec on A10 hardware using Infery LLM.

Variable Grouped-Query Attention for optimal computation efficiency
4096 token context window
BF16 precision support
Optimized for both research and commercial applications

Core Capabilities

Strong performance on multiple benchmarks (ARC, HellaSwag, PIQA)
74.58% accuracy on HellaSwag
77.09% accuracy on PIQA
71.01% accuracy on BoolQ
Up to 15x faster throughput compared to Llama 2 7B

Frequently Asked Questions

Q: What makes this model unique?

DeciLM-6b stands out for its variable Grouped-Query Attention mechanism, optimized through AutoNAC technology, delivering exceptional efficiency without compromising performance. The model achieves significantly higher throughput than comparable models while maintaining strong benchmark results.

Q: What are the recommended use cases?

The model is well-suited for both commercial and research applications in English language tasks. It can be fine-tuned for specific use cases and potentially adapted for other languages. Its high efficiency makes it particularly valuable for production environments where computational resources are a consideration.

DeciLM-6b

DeciLM-6b

What is DeciLM-6b?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models