Phind-CodeLlama-34B-v2-GPTQ

Property	Value
Base Model	CodeLlama 34B v2
Parameter Count	34 Billion
Training Data	1.5B tokens of programming data
HumanEval Score	73.8% pass@1
Training Infrastructure	32 A100-80GB GPUs
Training Duration	15 hours (480 GPU-hours)

What is Phind-CodeLlama-34B-v2-GPTQ?

Phind-CodeLlama-34B-v2-GPTQ is a quantized version of the state-of-the-art code generation model that builds upon CodeLlama 34B. This GPTQ-quantized variant maintains the exceptional performance of the original model while reducing its size and memory requirements, making it more accessible for practical use. The model excels at multi-language programming, including Python, C/C++, TypeScript, and Java.

Implementation Details

The model has been fine-tuned on 1.5B tokens of high-quality programming problems and solutions, using DeepSpeed ZeRO 3 and Flash Attention 2. The quantization is available in multiple configurations, from 3-bit to 4-bit with various group sizes, allowing users to balance between performance and resource requirements.

Multiple quantization options (3-bit to 4-bit)
Configurable group sizes (32g, 64g, 128g)
Supports sequence length of 8192 tokens
Compatible with AutoGPTQ and Transformers pipeline

Core Capabilities

State-of-the-art code generation with 73.8% pass@1 on HumanEval
Multi-language programming support
Instruction-tuned using Alpaca/Vicuna format
Efficient memory usage through quantization
Comprehensive programming problem-solving abilities

Frequently Asked Questions

Q: What makes this model unique?

This model represents the current state-of-the-art in open-source code generation, achieving an impressive 73.8% pass@1 on HumanEval. Its quantized nature makes it more accessible while maintaining high performance, and it supports multiple programming languages effectively.

Q: What are the recommended use cases?

The model excels at code generation, debugging, and programming assistance across multiple languages. It's particularly well-suited for developers needing AI assistance in Python, C/C++, TypeScript, and Java programming tasks, while being resource-efficient through quantization.