Dolly-v1-6b

Property	Value
Base Model	GPT-J 6B
Parameters	6 billion
Training Data	Stanford Alpaca Dataset (52K records)
License	CC-BY-NC-4.0
Training Time	30 minutes (1 epoch)

What is dolly-v1-6b?

Dolly-v1-6b is a large language model developed by Databricks that demonstrates how relatively older open-source models can achieve impressive instruction-following capabilities with minimal fine-tuning. Built on EleutherAI's GPT-J architecture, this model was fine-tuned on the Stanford Alpaca dataset using deepspeed ZeRO 3 technology.

Implementation Details

The model consists of 28 transformer layers with 16 attention heads each, utilizing Rotary Position Embedding (RoPE). It was trained on 8x A100 40GB GPUs on the Databricks Machine Learning Platform, achieving notable results in just 30 minutes of training.

Architecture: 28 transformer layers with 16 attention heads
Training Infrastructure: NDasrA100_v4 machine with 8x A100 40GB GPUs
Fine-tuning Dataset: Stanford Alpaca (52K instruction-following records)
Base Model: GPT-J 6B trained on The Pile (400B tokens)

Core Capabilities

Instruction following and task completion
Brainstorming and creative writing
Classification and extraction tasks
Summarization and rephrasing
Question answering (both closed and open-ended)

Frequently Asked Questions

Q: What makes this model unique?

Dolly-v1-6b demonstrates that high-quality instruction-following behavior can be achieved with minimal fine-tuning of older open-source models, making AI technology more accessible to the broader community.

Q: What are the recommended use cases?

The model is intended exclusively for research purposes and should not be used in high-risk applications. It's particularly suited for academic research and engineering experimentation in areas like text generation, instruction following, and language understanding.

dolly-v1-6b