YandexGPT-5-Lite-8B-pretrain

YandexGPT-5-Lite-8B-pretrain

yandex

YandexGPT-5-Lite-8B is a powerful pretrained language model with 8B parameters, 32k context length, and optimized Russian language processing capabilities.

PropertyValue
Parameter Count8 Billion
Context Length32,000 tokens
ArchitectureLLaMA-like
Model URLhttps://huggingface.co/yandex/YandexGPT-5-Lite-8B-pretrain

What is YandexGPT-5-Lite-8B-pretrain?

YandexGPT-5-Lite-8B-pretrain is a powerful language model developed by Yandex, featuring 8 billion parameters and an impressive 32k token context length. The model underwent a two-phase training process, initially trained on 15T tokens of predominantly Russian and English text, followed by a specialized "Powerup" phase with 320B tokens of high-quality data.

Implementation Details

The model features a LLaMA-like architecture and was trained in two distinct phases. The first phase focused on general training with a diverse dataset comprising 60% web pages, 15% code, 10% mathematics, and various other specific data. The second "Powerup" phase utilized a carefully curated dataset including 25% web pages, 19% mathematics, 18% code, and 18% educational content.

  • Optimized tokenizer for Russian language processing
  • 32k token context length (equivalent to approximately 48k tokens in Qwen-2.5)
  • Compatible with major fine-tuning frameworks
  • Supports both HF Transformers and vLLM implementations

Core Capabilities

  • Advanced Russian and English language processing
  • Extended context understanding (32k tokens)
  • Code generation and analysis
  • Mathematical computation and reasoning
  • Educational content processing

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive features include its optimized Russian language processing capabilities, extensive context length, and specialized two-phase training approach. The tokenizer efficiency for Russian language makes it particularly valuable for Russian-language applications.

Q: What are the recommended use cases?

The model is well-suited for various applications including code generation, mathematical analysis, educational content processing, and general language understanding tasks. It's particularly effective for applications requiring extended context understanding and Russian language processing.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026