Minueza-2-96M

Property	Value
Parameter Count	96 Million
Architecture	Llama-based Transformer
Context Length	4096 tokens
Training Tokens	185 billion
License	Apache License 2.0
Model URL	huggingface.co/Felladrin/Minueza-2-96M

What is Minueza-2-96M?

Minueza-2-96M is a lightweight language model designed for efficiency and accessibility. Built on the Llama architecture, this compact 96M parameter model was trained from scratch on both English and Portuguese datasets, making it particularly suitable for bilingual applications. The model processes context windows of 4096 tokens and was trained on 185 billion tokens, striking a balance between capability and resource efficiency.

Implementation Details

The model features a sophisticated architecture with 8 hidden layers, 12 attention heads, and 4 key-value heads. It employs a hidden size of 672 and an intermediate size of 2688, utilizing advanced features like attention dropout (0.1) and rotary position embeddings (RoPE) with theta=500000.

Trained with Adam optimizer (betas=0.9,0.95)
Linear learning rate scheduler with 0.0003 base rate
2000 warmup steps and 0.1 weight decay
Batch size of 512 (2M tokens per batch)

Core Capabilities

Mobile web browser compatibility via Wllama and Transformers.js
Efficient CPU-based inference
Foundational model for ChatML format fine-tuning
Bilingual support (English and Portuguese)

Frequently Asked Questions

Q: What makes this model unique?

Its compact size (96M parameters) and optimization for mobile/CPU deployment make it stand out, especially for resource-constrained environments. The bilingual training and 4096 token context window provide versatility despite its small size.

Q: What are the recommended use cases?

The model is best suited for mobile applications, CPU-based deployments, and as a foundation for specific fine-tuning tasks. However, users should note its limitations in complex reasoning and factual knowledge compared to larger models.

Minueza-2-96M

Minueza-2-96M

What is Minueza-2-96M?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models