TinyLlama-1.1B-intermediate-step-1431k-3T

TinyLlama-1.1B-intermediate-step-1431k-3T

TinyLlama

A compact 1.1B parameter LLaMA-based model trained on 3T tokens, achieving strong performance with normalized accuracy of 60.31% on HellaSwag benchmark.

PropertyValue
Parameter Count1.1B
LicenseApache 2.0
Training Tokens3 Trillion
ArchitectureLLaMA-based Transformer

What is TinyLlama-1.1B-intermediate-step-1431k-3T?

TinyLlama-1.1B is an ambitious project aimed at creating a compact yet powerful language model by pretraining a 1.1B parameter model on 3 trillion tokens. This specific checkpoint represents the final stage of training, achieving impressive performance metrics while maintaining a small computational footprint.

Implementation Details

The model adopts the same architecture and tokenizer as Llama 2, making it highly compatible with existing Llama-based projects. It was trained using 16 A100-40G GPUs over a 90-day period, demonstrating efficient resource utilization for large-scale training.

  • Identical architecture to Llama 2 for seamless integration
  • Trained on SlimPajama-627B and StarCoder datasets
  • Optimized for both performance and memory efficiency
  • Uses F32 tensor type for computations

Core Capabilities

  • HellaSwag (10-Shot): 60.31% normalized accuracy
  • Winogrande (5-shot): 59.51% accuracy
  • TruthfulQA (0-shot): 37.32% accuracy
  • MMLU (5-Shot): 26.04% accuracy
  • Efficient performance in resource-constrained environments

Frequently Asked Questions

Q: What makes this model unique?

TinyLlama stands out for achieving impressive performance metrics with only 1.1B parameters, making it significantly more efficient than larger models while maintaining strong capabilities. Its compatibility with the Llama ecosystem makes it particularly valuable for resource-constrained applications.

Q: What are the recommended use cases?

The model is ideal for applications requiring a balance between performance and computational efficiency, such as edge devices, rapid prototyping, and scenarios where larger models would be impractical. It's particularly well-suited for text generation tasks where resource constraints are a primary concern.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026