Dolly-v1-6b
Property | Value |
---|---|
Base Model | GPT-J 6B |
Parameters | 6 billion |
Training Data | Stanford Alpaca Dataset (52K records) |
License | CC-BY-NC-4.0 |
Training Time | 30 minutes (1 epoch) |
What is dolly-v1-6b?
Dolly-v1-6b is a large language model developed by Databricks that demonstrates how relatively older open-source models can achieve impressive instruction-following capabilities with minimal fine-tuning. Built on EleutherAI's GPT-J architecture, this model was fine-tuned on the Stanford Alpaca dataset using deepspeed ZeRO 3 technology.
Implementation Details
The model consists of 28 transformer layers with 16 attention heads each, utilizing Rotary Position Embedding (RoPE). It was trained on 8x A100 40GB GPUs on the Databricks Machine Learning Platform, achieving notable results in just 30 minutes of training.
- Architecture: 28 transformer layers with 16 attention heads
- Training Infrastructure: NDasrA100_v4 machine with 8x A100 40GB GPUs
- Fine-tuning Dataset: Stanford Alpaca (52K instruction-following records)
- Base Model: GPT-J 6B trained on The Pile (400B tokens)
Core Capabilities
- Instruction following and task completion
- Brainstorming and creative writing
- Classification and extraction tasks
- Summarization and rephrasing
- Question answering (both closed and open-ended)
Frequently Asked Questions
Q: What makes this model unique?
Dolly-v1-6b demonstrates that high-quality instruction-following behavior can be achieved with minimal fine-tuning of older open-source models, making AI technology more accessible to the broader community.
Q: What are the recommended use cases?
The model is intended exclusively for research purposes and should not be used in high-risk applications. It's particularly suited for academic research and engineering experimentation in areas like text generation, instruction following, and language understanding.