Obsidian-3B-V0.5

Maintained By
NousResearch

Obsidian-3B-V0.5

PropertyValue
LicenseCC-BY-SA-4.0
LanguageEnglish
FrameworkPyTorch
Training Datasets4 (Capybara, LessWrong-Amplify-Instruct, Pure-Dove, Verified-Camel)

What is Obsidian-3B-V0.5?

Obsidian-3B-V0.5 represents a groundbreaking achievement in multimodal AI, being the first sub-7B parameter model capable of processing both text and vision inputs. Built on the foundation of Capybara-3B-V1.9 and StableLM-3B-4e1t, this model achieves remarkable performance that rivals some 7B models, despite its smaller size.

Implementation Details

The model follows the LLaVA 1.5 training procedure and implements the ChatML format with '###' as separators. It's designed for efficient processing of both visual and textual inputs, making it particularly versatile for multimodal applications.

  • Built on Capybara-3B-V1.9 architecture
  • Implements ChatML format for interaction
  • Supports multimodal inputs (text and vision)
  • Developed by Nous Research in collaboration with Virtual Interactive

Core Capabilities

  • Vision-language processing
  • Text generation and understanding
  • Multimodal task handling
  • Efficient parameter utilization
  • State-of-the-art performance in its size category

Frequently Asked Questions

Q: What makes this model unique?

It's the first multimodal model at the 3B parameter scale, offering vision capabilities typically found in much larger models, while maintaining competitive performance.

Q: What are the recommended use cases?

The model is ideal for applications requiring both vision and language understanding, such as image description, visual question answering, and multimodal content analysis.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.