codet5p-110m-embedding

Maintained By
Salesforce

CodeT5+ 110M Embedding Model

PropertyValue
DeveloperSalesforce
LicenseBSD-3-Clause
PaperCodeT5+: Open Code Large Language Models
Embedding Size256 dimensions

What is codet5p-110m-embedding?

CodeT5+ 110M embedding is a specialized code embedding model developed by Salesforce that generates fixed-length (256-dimensional) vector representations of source code. It's built on the CodeT5+ architecture, which combines encoder-decoder frameworks with multiple pretraining objectives including span denoising, causal language modeling, and contrastive learning.

Implementation Details

The model consists of an encoder from CodeT5+ 220M model and a projection layer, trained through a two-stage process on both unimodal and bimodal data. It supports 9 programming languages including Python, Java, JavaScript, Go, Ruby, PHP, C, C++, and C#.

  • Utilizes the transformers library with AutoModel functionality
  • Compatible with the CodeT5 tokenizer
  • Trained on permissively licensed code from GitHub
  • Produces normalized 256-dimensional embeddings

Core Capabilities

  • Zero-shot code retrieval with 74.23% overall accuracy across languages
  • Strong performance on language-specific retrieval (90.69% for Go, 74.51% for Ruby)
  • Efficient code representation for similarity matching and search
  • Cross-language code understanding

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines multiple pretraining objectives and uses a compute-efficient pretraining method with frozen off-the-shelf LLMs. It's specifically designed for code embedding with strong cross-language performance.

Q: What are the recommended use cases?

The model is ideal for code retrieval tasks, code similarity detection, and code search applications. It's particularly effective when you need to convert code snippets into meaningful vector representations for downstream tasks.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.