StableBeluga2

Property	Value
Base Model	Llama2 70B
Developer	Stability AI
License	STABLE BELUGA NON-COMMERCIAL COMMUNITY LICENSE
Primary Language	English
Training Datasets	Orca-style Dataset

What is StableBeluga2?

StableBeluga2 represents Stability AI's advanced language model, built upon the foundation of Llama2 70B architecture and fine-tuned using a specialized Orca-style dataset. The model is designed to excel at instruction-following while maintaining safety and ethical considerations in its responses.

Implementation Details

The model implements a sophisticated training procedure using mixed-precision (BF16) and AdamW optimization. It features a two-phase training approach with specific hyperparameters: Phase 1 uses a batch size of 256 with packed data, while Phase 2 employs a batch size of 512 with unpacked data. Both phases utilize a learning rate of 3e-5 with cosine decay to 3e-6.

Specialized system prompt format for optimal interaction
Support for text generation with configurable parameters
Integration with HuggingFace Transformers library
Optimized for both CPU and GPU deployment

Core Capabilities

High-quality text generation and completion
Instruction following with safety considerations
Context-aware responses with system prompt integration
Support for various text generation parameters (top_p, top_k)

Frequently Asked Questions

Q: What makes this model unique?

StableBeluga2 stands out through its combination of Llama2 70B's powerful base architecture and specialized Orca-style dataset training, making it particularly effective at following instructions while maintaining safety guidelines.

Q: What are the recommended use cases?

The model is best suited for research and non-commercial applications requiring sophisticated language understanding and generation, including chatbots, text completion, and assisted writing tasks, while adhering to ethical AI principles.

StableBeluga2

StableBeluga2

What is StableBeluga2?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models