Brief-details: LLaVA-Mini is an efficient multimodal model using only 1 vision token for image/video understanding, reducing FLOPs by 77% while maintaining LLaVA-v1.5-level performance.
Brief Details: Llasa-1B is a text-to-speech model extending LLaMA with 65,536 XCodec2 speech tokens, trained on 250K hours of Chinese-English data.
BRIEF-DETAILS: NVIDIA's Cosmos-1.0-Guardrail is a safety-focused model that works alongside other Cosmos models to ensure responsible AI generation and content filtering.
Brief Details: A 1.1B parameter instruction-tuned LLM based on Llama 3.2, optimized for efficiency and general tasks with support for 4-bit and 8-bit quantization.
Brief-details: Dolphin3.0-Llama3.2-3B-GGUF is a quantized version of Dolphin 3.0, offering multiple compression options from 1.23GB to 12.86GB with varying quality-size tradeoffs
Brief-details: Turkish-optimized ColPali model for efficient document retrieval combining visual and textual features. Built on PaliGemma-3B for Turkish textbooks and scientific content.
Brief Details: A 3B parameter instruct-tuned model built on Qwen2.5, designed for general-purpose tasks including coding, math, and function calling. Part of Dolphin 3.0 collection.
Brief-details: SDXL-based model focusing on artistic illustration generation, created by John6666. Available on HuggingFace, related to photorealistic generation techniques.
Brief-details: BiRefNet-matting is a high-performance image matting model achieving 0.979 Smeasure on TE-P3M-500-NP, trained on multiple datasets for robust matting capabilities.
BRIEF DETAILS: Experimental text-generation model by Tann-dev hosted on HuggingFace. Limited documentation available. Related to language modeling and conversational AI.
Brief-details: Google's Gemma-2-27b is a powerful 27B parameter language model requiring license acceptance on Hugging Face, with smaller variants available at 2B and 9B parameters.
Brief details: A specialized diffusion model focused on anime/manga-style image generation, built by Delcos and available on HuggingFace, with regular updates and improvements.
BRIEF DETAILS: SigLIP 2 vision-language model trained on WebLI dataset, optimized for multilingual image-text understanding with improved semantic comprehension and localization capabilities.
Brief-details: Quantized version of Qwen2.5 14B model optimized for roleplay, offering various compression levels from 29GB to 5GB with different quality-performance tradeoffs.
Brief-details: A 32B parameter fusion model combining DeepSeek-R1-Distill-Qwen (90%) and Qwen2.5-Coder (10%) for enhanced programming capabilities and code generation
Brief Details: A specialized LoRA model for generating deep blue and white line illustrations, inspired by TOK style. Perfect for creating whimsical animal character art.
Brief Details: A fine-tuned Whisper-Base model optimized for speech-to-text conversion, achieving 8.2% WER and 4.5% CER on Mozilla Common Voice dataset. FP16 quantized for efficiency.
Brief Details: SigLIP 2 is Google's advanced vision-language encoder featuring improved semantic understanding and localization, trained on WebLI dataset using TPU-v5e chips.
Brief-details: A GGUF-quantized version of OpenThinker-7B-Uncensored-DeLMAT with multiple compression options ranging from 3.1GB to 15.3GB, optimized for efficiency
Brief Details: Baichuan-Audio-Base is an open-source end-to-end speech interaction model featuring audio tokenization, LLM integration, and flow-matching decoder for high-quality speech processing.
Brief-details: A comprehensive GGUF quantization collection of the L3.3-Cu-Mai-R1-70b model, offering various compression levels from 16GB to 75GB with different quality-size tradeoffs