BRIEF-DETAILS: LLaVE-2B: A 2B parameter multimodal embedding model based on Aquila-VL-2B, specializing in text-image-video embeddings with 4K token context window.
Brief Details: MoshiVis - A perceptually augmented multimodal model combining vision, speech, and text capabilities, built on Moshi backbone with PaliGemma2 vision encoder. ~7.6B total parameters.
Brief Details: Illustrious-XL-v1.1 is an enhanced language model focused on natural language processing with improved character understanding and color balance, achieving an ELO rating of 1617.
Brief-details: Roblox's Cube 3D (v0.1) is a pioneering text-to-shape generative AI model combining shape tokenization with 3D asset generation capabilities, designed for creators and researchers.
Brief-details: A powerful 32B-parameter vision-language model featuring advanced visual understanding, video comprehension up to 1+ hour, and enhanced mathematical abilities through reinforcement learning.
Brief-details: Advanced 3D asset generation model from Tencent featuring a 0.6B parameter shape generator, optimized for creating high-resolution textured 3D assets from images with improved efficiency.
Brief-details: 3D language model processing point clouds to understand spatial layouts and objects, offering 0.5B parameters for architectural analysis and scene understanding.
BRIEF-DETAILS: Mistral's 24B parameter instruction-tuned model with various GGUF quantizations for efficient deployment, offering quality-size tradeoffs from 6.5GB to 47GB
Brief Details: EXAONE-Deep-7.8B is a powerful 7.8B parameter LLM optimized for reasoning tasks, featuring 32K context length and superior performance in math and coding.
Brief-details: EXAONE-Deep-2.4B is a powerful 2.14B parameter language model excelling in reasoning tasks, featuring 32K context length and superior math/coding capabilities.
BRIEF-DETAILS: 8B parameter LLM based on Llama 3.1, optimized for reasoning and RAG tasks. Features 128K context, runs on single GPU, supports multiple languages.
Brief Details: Orpheus 3B - A Llama-based Speech-LLM for high-quality TTS with zero-shot voice cloning capabilities, developed by Canopy Labs
Brief Details: Skywork-R1V-38B is a powerful multimodal AI model combining InternViT vision encoder and DeepSeek language capabilities, excelling in visual reasoning and mathematics.
Brief Details: StarVector-1B is a foundation model that converts images/text to SVG code, using Vision-Language architecture for high-quality vectorization and icon generation.
BRIEF DETAILS: 7B parameter financial reasoning LLM fine-tuned on high-quality financial data. Achieves SOTA performance in financial QA and reasoning tasks. Built by SUFE-AIFLM-Lab.
Brief-details: NVIDIA's Canary-1B-Flash is a multilingual speech model with 883M parameters supporting ASR and translation across English, German, French, and Spanish, reaching 1000+ RTFx inference speed.
Brief Details: Stability AI's virtual camera model for advanced image manipulation and viewpoint synthesis, requiring license agreement acceptance.
Brief-details: NVIDIA's 49B parameter LLM based on Llama 3.3, optimized through Neural Architecture Search for efficiency and reasoning capabilities with 128K context length.
Brief-details: NVIDIA's GR00T-N1-2B is a pioneering 2B-parameter foundation model designed specifically for humanoid robot reasoning and skills, representing a breakthrough in robotic AI.
Brief-details: Advanced 32B parameter LLM specialized in reasoning tasks, featuring 64 layers, GQA attention, and strong performance in math/coding benchmarks, with 32K context.
Brief-details: StarVector is an 8B parameter model for converting images/text to SVG code, achieving SOTA performance in vector graphics generation using Vision-Language architecture.