Obsidian-3B-V0.5
Property | Value |
---|---|
License | CC-BY-SA-4.0 |
Language | English |
Framework | PyTorch |
Training Datasets | 4 (Capybara, LessWrong-Amplify-Instruct, Pure-Dove, Verified-Camel) |
What is Obsidian-3B-V0.5?
Obsidian-3B-V0.5 represents a groundbreaking achievement in multimodal AI, being the first sub-7B parameter model capable of processing both text and vision inputs. Built on the foundation of Capybara-3B-V1.9 and StableLM-3B-4e1t, this model achieves remarkable performance that rivals some 7B models, despite its smaller size.
Implementation Details
The model follows the LLaVA 1.5 training procedure and implements the ChatML format with '###' as separators. It's designed for efficient processing of both visual and textual inputs, making it particularly versatile for multimodal applications.
- Built on Capybara-3B-V1.9 architecture
- Implements ChatML format for interaction
- Supports multimodal inputs (text and vision)
- Developed by Nous Research in collaboration with Virtual Interactive
Core Capabilities
- Vision-language processing
- Text generation and understanding
- Multimodal task handling
- Efficient parameter utilization
- State-of-the-art performance in its size category
Frequently Asked Questions
Q: What makes this model unique?
It's the first multimodal model at the 3B parameter scale, offering vision capabilities typically found in much larger models, while maintaining competitive performance.
Q: What are the recommended use cases?
The model is ideal for applications requiring both vision and language understanding, such as image description, visual question answering, and multimodal content analysis.