CodeFuse-CodeLlama-34B
Property | Value |
---|---|
Base Model | CodeLlama-34b-Python |
License | Other |
Paper | arXiv:2311.02303 |
HumanEval Score | 74.4% (pass@1) |
What is CodeFuse-CodeLlama-34B?
CodeFuse-CodeLlama-34B is a state-of-the-art code generation model that has been fine-tuned using QLoRA on CodeLlama-34b-Python with 600,000 code instructions and answers. The model demonstrates exceptional performance, achieving 74.4% accuracy on the HumanEval benchmark, surpassing both GPT-4 and other open-source alternatives.
Implementation Details
The model utilizes a 4K context length during fine-tuning, with the capability to extend to 16K if needed. It's implemented using PyTorch and requires CUDA 11.4 for optimal performance. The architecture supports both regular and 4-bit quantized versions, making it adaptable to various computational resources.
- Supports multiple programming languages with a focus on Python
- Uses specialized role-based prompting format for interactions
- Implements efficient tokenization with left-side padding
- Provides flexible deployment options through FasterTransformer
Core Capabilities
- Superior code generation with 74.4% HumanEval pass@1 accuracy
- Multi-turn conversation support through structured prompting
- Efficient memory usage with 4-bit quantization option
- Extensive context understanding up to 4K tokens
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its state-of-the-art performance on code generation tasks, achieved through careful fine-tuning of the CodeLlama base model using QLoRA. Its 74.4% accuracy on HumanEval represents the current SOTA for open-source code LLMs.
Q: What are the recommended use cases?
The model excels in code generation, completion, and understanding tasks. It's particularly well-suited for Python development, technical documentation, and code-related conversational tasks. The model can be deployed in both research and production environments, with options for optimization through quantization.