Biggie-SmoLlm-0.15B-Base

Maintained By
nisten

Biggie-SmoLlm-0.15B-Base

PropertyValue
Parameter Count181M parameters
Model TypeBase Language Model
AuthorNisten Tahiraj (rakun.ai)
Inference Speed160 tokens/sec on 1 CPU core
Model Size164MB (GGUF format)

What is Biggie-SmoLlm-0.15B-Base?

Biggie-SmoLlm-0.15B-Base is an innovative small language model that represents a "Frankenstein" combination of SmolLM-135M, expanded to 0.18B parameters. The model is specifically designed for efficient CPU inference and serves as an excellent base for further training. Despite its compact size, it achieves remarkable performance with 160 tokens per second on a single CPU core.

Implementation Details

The model implements several cutting-edge techniques including evolutionary merging, BitNet integration, and the experimental GrokAdamW optimizer. It's optimized for quantization and can run efficiently in int8 q8_0 format. The implementation focuses on maintaining coherence while maximizing performance on minimal hardware requirements.

  • Uses custom GrokAdamW optimizer for training
  • Implements evolutionary model merging techniques
  • Supports efficient CPU inference without GPU requirements
  • Includes specialized prompt formatting and temperature controls

Core Capabilities

  • High-speed inference on CPU (160 tokens/sec)
  • Efficient memory usage (164MB model size)
  • Coherent text generation with appropriate temperature settings
  • Suitable for further fine-tuning and customization
  • Supports context window of 1024 tokens

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional CPU performance and efficiency, achieving 160 tokens per second on a single CPU core. It's particularly notable for combining evolutionary merging techniques with the GrokAdamW optimizer, resulting in a highly efficient and coherent language model despite its small size.

Q: What are the recommended use cases?

The model is ideal for scenarios requiring efficient CPU-based inference, development environments with limited resources, and as a base model for further fine-tuning. It's particularly suitable for applications where quick response times are needed without access to GPU resources.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.