Brief-details: RoboBrain is a unified brain model for robotic manipulation that combines planning, affordance perception, and trajectory prediction capabilities, published in CVPR 2025.
BRIEF-DETAILS: Advanced zero-shot TTS system with GPT-style architecture, featuring Chinese pronunciation correction and precise pause control. Built on XTTS/Tortoise with enhanced speaker features and BigVGAN2.
Brief-details: Advanced 8B parameter LLM fine-tuned for tool/function calling, achieving SOTA performance on Berkeley Function-Calling Leaderboard rivaling GPT-4
Brief Details: Persian-focused 7B parameter LLM optimized for content generation, translation, and Q&A. Features multilingual support with emphasis on Persian language and cultural context.
Brief-details: EraX-WoW-Turbo-V1.1 is a high-speed multilingual speech recognition model, optimized for Vietnamese and 10 other languages, featuring real-time transcription capabilities and ~12% WER.
Brief Details: GGUF conversion of Wan2.1-Fun-14B-InP model, optimized for ComfyUI integration with 14B parameters, specializing in image generation tasks.
Brief Details: Tessa-T1-14B is a specialized React-focused LLM based on Qwen2.5-Coder, optimized for generating semantic React components with advanced reasoning capabilities.
Brief Details: Vietnamese TTS model fine-tuned on 150h of speech data. Supports high-quality voice synthesis with research-only license. Built on F5-TTS base architecture.
Brief-details: A 1.3B parameter text-to-video generation model supporting multi-resolution training and start/end frame prediction, part of Alibaba's Wan2.1 video generation ecosystem
Brief-details: Distil-Large-v3.5 is a knowledge-distilled version of Whisper-Large-v3, offering 1.5x faster inference while maintaining high accuracy for speech recognition tasks, trained on 98k hours of data.
BRIEF-DETAILS: Ling-Coder-lite is a 16.8B parameter MoE LLM optimized for coding, featuring 2.75B activated parameters and 16K context length
BRIEF DETAILS: LoRA model trained on FLUX.1-dev for image generation via Replicate. Uses "TOK" trigger word and integrates with 🧨 diffusers library.
Brief-details: DeepSeek V3 0324 AWQ is a quantized version of DeepSeek V3, optimized for efficient inference on high-end GPUs with impressive performance benchmarks across various configurations.
BRIEF-DETAILS: Gemma-3-Glitter-12B is a creative writing model merging RP instruction and storytelling capabilities, built on Gemma 3 12B IT with vision support.
Brief Details: Speech recognition model fine-tuned on speech commands, achieving 97.59% accuracy. Based on wav2vec2-base with excellent performance on command classification.
Brief Details: A 1.5B parameter RL-enhanced language model optimized for mathematics, achieving 49.90% average accuracy across benchmarks with GRPO algorithm
Brief Details: An 8B parameter LLaMA-based language model optimized for chat and NLP tasks. Supports multiple quantization formats and features MIT license.
BRIEF-DETAILS: Fine-tuned Whisper-small model specialized in language identification, achieving 88.6% accuracy with linear learning rate scheduling and mixed precision training
BRIEF-DETAILS: A Studio Ghibli-style LoRA adapter for FLUX.1-dev that transforms popular characters into Ghibli's distinctive artistic style, optimized for creative character illustrations.
BRIEF-DETAILS: German Text-to-Speech model featuring 4 distinct voices (Lena, Anna, Max, Tom), based on orpheus-3b-0.1-ft with ~4.5h training data per voice.
Brief-details: Zero-shot identity preservation model for ComfyUI, featuring dual-stage architecture (sim_stage1 & aes_stage2) with face preservation and aesthetics optimization capabilities.