Phind-CodeLlama-34B-v1
Property | Value |
---|---|
License | Llama 2 |
Training Infrastructure | 32x A100-80GB GPUs |
Training Time | 90 GPU-hours |
HumanEval Score | 67.6% pass@1 |
What is Phind-CodeLlama-34B-v1?
Phind-CodeLlama-34B-v1 is a sophisticated code generation model that represents a significant advancement in AI-powered programming assistance. Fine-tuned from CodeLlama-34B, this model matches GPT-4's performance on the HumanEval benchmark, achieving an impressive 67.6% pass@1 rate. The model was trained on a proprietary dataset of approximately 80,000 high-quality programming problems and solutions, focusing on instruction-answer pairs rather than traditional code completion examples.
Implementation Details
The model utilizes state-of-the-art training techniques, including DeepSpeed ZeRO 3 and Flash Attention 2, enabling efficient training on 32 A100-80GB GPUs in just three hours. The training process involved a sequence length of 4096 tokens and a complete native finetune without using LoRA adaptations.
- Native finetune implementation without LoRA
- Trained for 2 epochs (~160k total examples)
- Utilizes DeepSpeed ZeRO 3 and Flash Attention 2
- 4096 token sequence length
Core Capabilities
- High-performance code generation matching GPT-4's capabilities
- Instruction-tuned for programming tasks
- Efficient processing of programming problems and solutions
- Supports various programming languages with focus on Python
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its ability to match GPT-4's performance on HumanEval while being specifically optimized for code generation tasks. It's trained on a carefully curated dataset of programming problems and uses advanced training techniques for optimal performance.
Q: What are the recommended use cases?
The model excels at code generation tasks and is particularly well-suited for programming assistance, code completion, and solving programming problems. It's recommended to use simple prompts followed by "\n: " rather than complex chat markup.