Biggie-SmoLlm-0.15B-Base

Property	Value
Parameter Count	181M parameters
Model Type	Base Language Model
Author	Nisten Tahiraj (rakun.ai)
Inference Speed	160 tokens/sec on 1 CPU core
Model Size	164MB (GGUF format)

What is Biggie-SmoLlm-0.15B-Base?

Biggie-SmoLlm-0.15B-Base is an innovative small language model that represents a "Frankenstein" combination of SmolLM-135M, expanded to 0.18B parameters. The model is specifically designed for efficient CPU inference and serves as an excellent base for further training. Despite its compact size, it achieves remarkable performance with 160 tokens per second on a single CPU core.

Implementation Details

The model implements several cutting-edge techniques including evolutionary merging, BitNet integration, and the experimental GrokAdamW optimizer. It's optimized for quantization and can run efficiently in int8 q8_0 format. The implementation focuses on maintaining coherence while maximizing performance on minimal hardware requirements.

Uses custom GrokAdamW optimizer for training
Implements evolutionary model merging techniques
Supports efficient CPU inference without GPU requirements
Includes specialized prompt formatting and temperature controls

Core Capabilities

High-speed inference on CPU (160 tokens/sec)
Efficient memory usage (164MB model size)
Coherent text generation with appropriate temperature settings
Suitable for further fine-tuning and customization
Supports context window of 1024 tokens

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional CPU performance and efficiency, achieving 160 tokens per second on a single CPU core. It's particularly notable for combining evolutionary merging techniques with the GrokAdamW optimizer, resulting in a highly efficient and coherent language model despite its small size.

Q: What are the recommended use cases?

The model is ideal for scenarios requiring efficient CPU-based inference, development environments with limited resources, and as a base model for further fine-tuning. It's particularly suitable for applications where quick response times are needed without access to GPU resources.