Obsidian-3B-V0.5

Obsidian-3B-V0.5

NousResearch

World's first 3B-parameter multimodal LLM built on StableLM, capable of vision tasks with ChatML format support

PropertyValue
LicenseCC-BY-SA-4.0
LanguageEnglish
FrameworkPyTorch
Training Datasets4 (Capybara, LessWrong-Amplify-Instruct, Pure-Dove, Verified-Camel)

What is Obsidian-3B-V0.5?

Obsidian-3B-V0.5 represents a groundbreaking achievement in multimodal AI, being the first sub-7B parameter model capable of processing both text and vision inputs. Built on the foundation of Capybara-3B-V1.9 and StableLM-3B-4e1t, this model achieves remarkable performance that rivals some 7B models, despite its smaller size.

Implementation Details

The model follows the LLaVA 1.5 training procedure and implements the ChatML format with '###' as separators. It's designed for efficient processing of both visual and textual inputs, making it particularly versatile for multimodal applications.

  • Built on Capybara-3B-V1.9 architecture
  • Implements ChatML format for interaction
  • Supports multimodal inputs (text and vision)
  • Developed by Nous Research in collaboration with Virtual Interactive

Core Capabilities

  • Vision-language processing
  • Text generation and understanding
  • Multimodal task handling
  • Efficient parameter utilization
  • State-of-the-art performance in its size category

Frequently Asked Questions

Q: What makes this model unique?

It's the first multimodal model at the 3B parameter scale, offering vision capabilities typically found in much larger models, while maintaining competitive performance.

Q: What are the recommended use cases?

The model is ideal for applications requiring both vision and language understanding, such as image description, visual question answering, and multimodal content analysis.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026