santacoder

Maintained By
bigcode

SantaCoder

PropertyValue
Parameters1.1B
Training DataThe Stack v1.1 (Python, Java, JavaScript)
LicenseBigCode OpenRAIL-M
PaperSantaCoder: Don't reach for the stars!
Training Infrastructure96 Tesla V100 GPUs

What is SantaCoder?

SantaCoder is a specialized code generation model trained on a carefully curated dataset of Python, Java, and JavaScript code. It implements Multi-Query Attention and the innovative Fill-in-the-Middle objective, allowing it to not only generate code sequentially but also fill in missing code segments within existing structures.

Implementation Details

The model was trained for 600K steps on 236 billion tokens using a GPT-2 architecture with modifications. It employs a 2048-token context window and uses float16 precision for efficient computation. The training process took 6.2 days on 96 Tesla V100 GPUs, accumulating approximately 2.1 x 10e21 FLOPS.

  • Utilizes Multi-Query Attention for improved efficiency
  • Implements Fill-in-the-Middle objective for versatile code completion
  • Trained with near-deduplication and comment-to-code ratio filtering
  • Supports three major programming languages: Python, Java, and JavaScript

Core Capabilities

  • Code generation and completion in Python, Java, and JavaScript
  • Fill-in-the-Middle functionality for code infilling tasks
  • Achieves 18% pass@1 on Python HumanEval benchmark
  • Strong performance on code-to-text tasks with 18.13 BLEU score

Frequently Asked Questions

Q: What makes this model unique?

SantaCoder's combination of Multi-Query Attention and Fill-in-the-Middle objective, along with its focused training on three major programming languages, makes it particularly effective for code generation tasks. The model was trained with careful consideration of code quality through filtering criteria like near-deduplication and comment-to-code ratio.

Q: What are the recommended use cases?

The model excels at code completion and generation when provided with appropriate context like comments or function signatures. It's important to note that it's not an instruction-following model, so inputs should be formatted as they would appear in source code rather than as natural language commands.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.