Brief-details: A 14B parameter merged LLM using Model Stock method combining multiple SFT checkpoints, achieving 42.90 avg on OpenLLM benchmarks with strong IFEval performance
Brief-details: SigLIP 2 Base - Advanced vision-language model for improved semantic understanding and localization. Supports zero-shot classification and image-text retrieval. Google's latest vision encoder.
Brief-details: Step-Audio-TTS-3B is a groundbreaking text-to-speech model trained on synthetic data, supporting multiple languages, emotions, and RAP/humming generation with SOTA performance.
Brief-details: YuE-s1-7B-anneal-en-cot is a groundbreaking 7B parameter music generation model capable of transforming lyrics into complete songs with vocal and accompaniment tracks
Brief-details: A powerful 7B parameter GUI interaction model achieving SOTA performance in perception (79.7% on VisualWebBench), grounding (91.6% on ScreenSpot v2), and automation across platforms
Brief-details: A 32B parameter merged LLM combining DeepSeek-R1, QwQ, and SkyT1 models, achieving 74% accuracy on AIME24 and strong performance in math/science reasoning.
BRIEF-DETAILS: Experimental SD1.5 model with SDXL VAE, trained on LAION2B datasets using fp32 precision. Currently in alpha with 10 epochs of training.
Brief-details: Animagine XL 4.0 is a state-of-the-art anime-focused SDXL model trained on 8.4M images, featuring enhanced stability, anatomy accuracy, and color fidelity.
Brief-details: Phi-4 is Microsoft's 14B parameter LLM optimized for reasoning and efficiency, featuring 16K context, MIT license, and strong performance on math/science benchmarks
BRIEF DETAILS: Google's TimesFM 2.0 - A 500M parameter decoder-only foundation model for time-series forecasting, handling sequences up to 2048 points with flexible horizon lengths.
Brief-details: CosyVoice2-0.5B is a scalable multilingual zero-shot text-to-speech model using supervised semantic tokens, supporting streaming inference and multiple languages/voices
BRIEF-DETAILS: Quantized GGUF conversion of Tencent's HunyuanVideo for ComfyUI, enabling efficient video generation with native nodes and optimized performance.
Brief-details: Meta's 11B parameter vision-language model, part of Llama 3 series. Capable of understanding and analyzing images with text generation capabilities.
BRIEF-DETAILS: Stability AI's video generation model that transforms still images into fluid videos, part of the Stable Diffusion family with extended capabilities.
Brief Details: LogiLlama is a 1B-parameter LLM optimized for logical reasoning, featuring enhanced problem-solving capabilities while maintaining efficiency for on-device deployment.
Brief-details: A Llama.cpp-compatible 8B parameter YandexGPT model variant requiring 9GB RAM, optimized for efficient deployment with multiple quantization options.
Brief-details: Multimodal retrieval model achieving SOTA performance in composed image retrieval, trained on MegaPairs dataset with 26M+ triplets. Excels in zero-shot tasks.
Brief Details: 24B parameter Mistral-based model fine-tuned for multi-turn instruction following and reasoning, with Claude-like capabilities and reasoning blocks support
Brief Details: A 70B parameter LLaMA-based creative language model utilizing Japanese metalworking-inspired architecture, featuring enhanced reasoning and creative expression through SCE merge methodology.
Brief-details: Powerful 83B parameter multilingual LLM supporting 25 languages and 90% of global speakers, with strong performance in reasoning and knowledge tasks
BRIEF-DETAILS: Advanced 14B parameter UI/web development model based on Qwen 2.5 architecture, specializing in HTML/CSS/Tailwind with 128K context window and 8K token output capability.