Ichigo-llama3.1-s-instruct-v0.4-GGUF
Property | Value |
---|---|
Parameter Count | 8.03B |
License | Apache 2.0 |
Architecture | Llama-3 |
Paper | AudioBench Paper |
Language | English |
What is Ichigo-llama3.1-s-instruct-v0.4-GGUF?
This is a quantized version of the Ichigo-llama3.1 model, specifically designed to understand both audio and text inputs. It represents a significant advancement in multi-modal AI, trained on over 1 billion tokens from the Instruction Speech WhisperVQ v4 dataset. The model showcases improved robustness against environmental noise and enhanced multi-turn conversation capabilities.
Implementation Details
The model utilizes the Llama-3 architecture as its foundation and incorporates sophisticated audio processing capabilities through WhisperVQ integration. It achieved an impressive MMLU score of 64.66, demonstrating strong performance in both general knowledge and audio understanding tasks.
- Trained using FSDP2 implementation on 8x NVIDIA H100-SXM-80GB GPUs
- Implements cosine learning rate scheduling with warmup
- Uses Adam optimizer with torch fusion
- Maximum sequence length of 4096 tokens
Core Capabilities
- Dual modality processing (audio and text)
- Noise-resistant audio understanding
- Multi-turn conversation handling
- High performance on AudioBench evaluations (3.5/5 on OpenHermes)
- Competitive MMLU scores against base models
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its ability to process both audio and text inputs with high accuracy, while maintaining robustness against environmental noise. It's particularly notable for achieving near-parity with specialized audio models while retaining strong general language understanding capabilities.
Q: What are the recommended use cases?
The model is primarily intended for research applications, particularly in scenarios requiring both audio and text understanding. It's well-suited for multi-turn conversations involving audio inputs, speech understanding tasks, and general language processing applications.