Models

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

RoboBrain

Brief-details: RoboBrain is a unified brain model for robotic manipulation that combines planning, affordance perception, and trajectory prediction capabilities, published in CVPR 2025.

IndexTeam

Index-TTS

BRIEF-DETAILS: Advanced zero-shot TTS system with GPT-style architecture, featuring Chinese pronunciation correction and precise pause control. Built on XTTS/Tortoise with enhanced speaker features and BigVGAN2.

Team-ACE

ToolACE-2-Llama-3.1-8B

Brief-details: Advanced 8B parameter LLM fine-tuned for tool/function calling, achieving SOTA performance on Berkeley Function-Calling Leaderboard rivaling GPT-4

fibonacciai

Persian-llm-fibonacci-1-7b-chat.P1_0

Brief Details: Persian-focused 7B parameter LLM optimized for content generation, translation, and Q&A. Features multilingual support with emphasis on Persian language and cultural context.

erax-ai

EraX-WoW-Turbo-V1.1

Brief-details: EraX-WoW-Turbo-V1.1 is a high-speed multilingual speech recognition model, optimized for Vietnamese and 10 other languages, featuring real-time transcription capabilities and ~12% WER.

city96

Wan2.1-Fun-14B-InP-gguf

Brief Details: GGUF conversion of Wan2.1-Fun-14B-InP model, optimized for ComfyUI integration with 14B parameters, specializing in image generation tasks.

Tesslate

Tessa-T1-14B

Brief Details: Tessa-T1-14B is a specialized React-focused LLM based on Qwen2.5-Coder, optimized for generating semantic React components with advanced reasoning capabilities.

hynt

F5-TTS-Vietnamese-100h

Brief Details: Vietnamese TTS model fine-tuned on 150h of speech data. Supports high-quality voice synthesis with research-only license. Built on F5-TTS base architecture.

alibaba-pai

Wan2.1-Fun-1.3B-InP

Brief-details: A 1.3B parameter text-to-video generation model supporting multi-resolution training and start/end frame prediction, part of Alibaba's Wan2.1 video generation ecosystem

distil-whisper

distil-large-v3.5

Brief-details: Distil-Large-v3.5 is a knowledge-distilled version of Whisper-Large-v3, offering 1.5x faster inference while maintaining high accuracy for speech recognition tasks, trained on 98k hours of data.

inclusionAI

Ling-Coder-lite

BRIEF-DETAILS: Ling-Coder-lite is a 16.8B parameter MoE LLM optimized for coding, featuring 2.75B activated parameters and 16K context length

codermert

ozgeee3_fluxxx

BRIEF DETAILS: LoRA model trained on FLUX.1-dev for image generation via Replicate. Uses "TOK" trigger word and integrates with 🧨 diffusers library.

cognitivecomputations

DeepSeek-V3-0324-AWQ

Brief-details: DeepSeek V3 0324 AWQ is a quantized version of DeepSeek V3, optimized for efficient inference on high-end GPUs with impressive performance benchmarks across various configurations.

allura-org

Gemma-3-Glitter-12B

BRIEF-DETAILS: Gemma-3-Glitter-12B is a creative writing model merging RP instruction and storytelling capabilities, built on Gemma 3 12B IT with vision support.

0xb1

wav2vec2-base-finetuned-speech_commands-v0.02

Brief Details: Speech recognition model fine-tuned on speech commands, achieving 97.59% accuracy. Based on wav2vec2-base with excellent performance on command classification.

Intelligent-Internet

II-Thought-1.5B-Preview

Brief Details: A 1.5B parameter RL-enhanced language model optimized for mathematics, achieving 49.90% average accuracy across benchmarks with GRPO algorithm

fibonacciai

fibonacci-1-EN-8b-chat.P1_5

Brief Details: An 8B parameter LLaMA-based language model optimized for chat and NLP tasks. Supports multiple quantization formats and features MIT license.

sanchit-gandhi

whisper-small-ft-common-language-id

BRIEF-DETAILS: Fine-tuned Whisper-small model specialized in language identification, achieving 88.6% accuracy with linear learning rate scheduling and mixed precision training

alvarobartt

ghibli-characters-flux-lora

BRIEF-DETAILS: A Studio Ghibli-style LoRA adapter for FLUX.1-dev that transforms popular characters into Ghibli's distinctive artistic style, optimized for creative character illustrations.

VAGOsolutions

SauerkrautTTS-Preview-0.1

BRIEF-DETAILS: German Text-to-Speech model featuring 4 distinct voices (Lena, Anna, Max, Tom), based on orpheus-3b-0.1-ft with ~4.5h training data per voice.

vuongminhkhoi4

ComfyUI_InfiniteYou

Brief-details: Zero-shot identity preservation model for ComfyUI, featuring dual-stage architecture (sim_stage1 & aes_stage2) with face preservation and aesthetics optimization capabilities.