Phind-CodeLlama-34B-v2-GPTQ
Property | Value |
---|---|
Base Model | CodeLlama 34B v2 |
Parameter Count | 34 Billion |
Training Data | 1.5B tokens of programming data |
HumanEval Score | 73.8% pass@1 |
Training Infrastructure | 32 A100-80GB GPUs |
Training Duration | 15 hours (480 GPU-hours) |
What is Phind-CodeLlama-34B-v2-GPTQ?
Phind-CodeLlama-34B-v2-GPTQ is a quantized version of the state-of-the-art code generation model that builds upon CodeLlama 34B. This GPTQ-quantized variant maintains the exceptional performance of the original model while reducing its size and memory requirements, making it more accessible for practical use. The model excels at multi-language programming, including Python, C/C++, TypeScript, and Java.
Implementation Details
The model has been fine-tuned on 1.5B tokens of high-quality programming problems and solutions, using DeepSpeed ZeRO 3 and Flash Attention 2. The quantization is available in multiple configurations, from 3-bit to 4-bit with various group sizes, allowing users to balance between performance and resource requirements.
- Multiple quantization options (3-bit to 4-bit)
- Configurable group sizes (32g, 64g, 128g)
- Supports sequence length of 8192 tokens
- Compatible with AutoGPTQ and Transformers pipeline
Core Capabilities
- State-of-the-art code generation with 73.8% pass@1 on HumanEval
- Multi-language programming support
- Instruction-tuned using Alpaca/Vicuna format
- Efficient memory usage through quantization
- Comprehensive programming problem-solving abilities
Frequently Asked Questions
Q: What makes this model unique?
This model represents the current state-of-the-art in open-source code generation, achieving an impressive 73.8% pass@1 on HumanEval. Its quantized nature makes it more accessible while maintaining high performance, and it supports multiple programming languages effectively.
Q: What are the recommended use cases?
The model excels at code generation, debugging, and programming assistance across multiple languages. It's particularly well-suited for developers needing AI assistance in Python, C/C++, TypeScript, and Java programming tasks, while being resource-efficient through quantization.