Obsidian-3B-V0.5

Property	Value
License	CC-BY-SA-4.0
Language	English
Framework	PyTorch
Training Datasets	4 (Capybara, LessWrong-Amplify-Instruct, Pure-Dove, Verified-Camel)

What is Obsidian-3B-V0.5?

Obsidian-3B-V0.5 represents a groundbreaking achievement in multimodal AI, being the first sub-7B parameter model capable of processing both text and vision inputs. Built on the foundation of Capybara-3B-V1.9 and StableLM-3B-4e1t, this model achieves remarkable performance that rivals some 7B models, despite its smaller size.

Implementation Details

The model follows the LLaVA 1.5 training procedure and implements the ChatML format with '###' as separators. It's designed for efficient processing of both visual and textual inputs, making it particularly versatile for multimodal applications.

Built on Capybara-3B-V1.9 architecture
Implements ChatML format for interaction
Supports multimodal inputs (text and vision)
Developed by Nous Research in collaboration with Virtual Interactive

Core Capabilities

Vision-language processing
Text generation and understanding
Multimodal task handling
Efficient parameter utilization
State-of-the-art performance in its size category

Frequently Asked Questions

Q: What makes this model unique?

It's the first multimodal model at the 3B parameter scale, offering vision capabilities typically found in much larger models, while maintaining competitive performance.

Q: What are the recommended use cases?

The model is ideal for applications requiring both vision and language understanding, such as image description, visual question answering, and multimodal content analysis.

Obsidian-3B-V0.5

Obsidian-3B-V0.5

What is Obsidian-3B-V0.5?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models