CodeT5+ 16B
Property | Value |
---|---|
Author | Salesforce |
License | BSD-3-Clause |
Paper | View Paper |
Supported Languages | Python, Java, JavaScript, Go, C++, C#, PHP, Ruby, C |
What is CodeT5+ 16B?
CodeT5+ 16B is an advanced large language model specifically designed for code understanding and generation tasks. It features a unique encoder-decoder architecture that can operate in multiple modes: encoder-only, decoder-only, and encoder-decoder. This flexibility allows it to handle a wide range of programming tasks effectively. The model represents a significant upgrade from the original CodeT5 family, scaling up from previous versions (220M and 770M) to a robust 16B parameters.
Implementation Details
The model employs a compute-efficient pretraining method, initializing its components with frozen off-the-shelf LLMs - specifically CodeGen-350M-mono for the encoder and CodeGen-16B-mono for the decoder. It utilizes a "shallow encoder and deep decoder" architecture and is trained on a carefully curated, permissively licensed subset of the github-code dataset.
- Diverse pretraining tasks including span denoising, causal language modeling, contrastive learning, and text-code matching
- Supports both unimodal code data and bimodal code-text data
- Implements instruction tuning following Code Alpaca methodology
Core Capabilities
- Advanced code completion with state-of-the-art performance
- Text-to-code retrieval with +3.2 average MRR improvement
- Code generation with 35.0% pass@1 and 54.5% pass@10 on HumanEval
- Exceptional performance in math programming tasks
- Multi-language support across 9 programming languages
Frequently Asked Questions
Q: What makes this model unique?
CodeT5+ 16B stands out for its flexible architecture that can operate in multiple modes and its impressive scale of 16B parameters. It achieves state-of-the-art results in code generation tasks, even surpassing some closed-source models like OpenAI's code-cushman-001.
Q: What are the recommended use cases?
The model excels in code generation, code completion, and code understanding tasks. It's particularly effective for Python code generation, text-to-code retrieval, and solving complex programming problems, including mathematical programming tasks.