Minueza-2-96M
Property | Value |
---|---|
Parameter Count | 96 Million |
Architecture | Llama-based Transformer |
Context Length | 4096 tokens |
Training Tokens | 185 billion |
License | Apache License 2.0 |
Model URL | huggingface.co/Felladrin/Minueza-2-96M |
What is Minueza-2-96M?
Minueza-2-96M is a lightweight language model designed for efficiency and accessibility. Built on the Llama architecture, this compact 96M parameter model was trained from scratch on both English and Portuguese datasets, making it particularly suitable for bilingual applications. The model processes context windows of 4096 tokens and was trained on 185 billion tokens, striking a balance between capability and resource efficiency.
Implementation Details
The model features a sophisticated architecture with 8 hidden layers, 12 attention heads, and 4 key-value heads. It employs a hidden size of 672 and an intermediate size of 2688, utilizing advanced features like attention dropout (0.1) and rotary position embeddings (RoPE) with theta=500000.
- Trained with Adam optimizer (betas=0.9,0.95)
- Linear learning rate scheduler with 0.0003 base rate
- 2000 warmup steps and 0.1 weight decay
- Batch size of 512 (2M tokens per batch)
Core Capabilities
- Mobile web browser compatibility via Wllama and Transformers.js
- Efficient CPU-based inference
- Foundational model for ChatML format fine-tuning
- Bilingual support (English and Portuguese)
Frequently Asked Questions
Q: What makes this model unique?
Its compact size (96M parameters) and optimization for mobile/CPU deployment make it stand out, especially for resource-constrained environments. The bilingual training and 4096 token context window provide versatility despite its small size.
Q: What are the recommended use cases?
The model is best suited for mobile applications, CPU-based deployments, and as a foundation for specific fine-tuning tasks. However, users should note its limitations in complex reasoning and factual knowledge compared to larger models.