Brief-details: Modified SDXL VAE with fp16 fix, offering improved contrast and brightness control. Features multiple versions with different contrast/brightness multipliers for enhanced image generation.
Brief Details: Kandinsky 2.2 Decoder - Advanced text-to-image diffusion model combining CLIP and latent diffusion, supporting high-res image generation up to 1024x1024 with flexible aspect ratios.
Brief Details: A distilled version of BART-MNLI optimized for zero-shot classification, achieving 87.08% accuracy while maintaining efficiency through layer reduction.
Brief Details: HuBERT-based model for Russian speech emotion recognition, fine-tuned on DUSHA dataset. 316M params, 86% accuracy, handles 5 emotions.
Brief-details: MaxViT base model with 120M params, trained on ImageNet-21k and fine-tuned on ImageNet-1k. Achieves 88.2% top-1 accuracy at 512px resolution.
Brief-details: SDXL-ControlNet model for versatile line art conditioning, supporting multiple line art inputs with high accuracy and stability at 1024px+ resolution.
Brief Details: Multi-lingual 8B parameter AWQ-compressed LLM optimized for German-English, using Spectrum Fine-Tuning on 25% layers, supporting 6 languages.
Brief-details: A fine-tuned SDXL LoRA model optimized for text-to-image generation, built on stabilityai/stable-diffusion-xl-base-1.0 with 27k+ downloads and OpenRail++ license.
Brief Details: A distilled BART model trained on CNN data for text summarization, offering 2.09x speedup over baseline with 230M parameters and strong ROUGE scores.
Brief Details: Advanced speech recognition model with 593M parameters, achieving 1.96% WER on clean speech. Built by Facebook using Conformer architecture with rotary embeddings.
Brief Details: A powerful 335M parameter T5-based encoder model for semantic search, mapping sentences to 768-dimensional vectors with state-of-the-art retrieval capabilities.
Brief-details: GATE-AraBert-v1 is a 135M parameter Arabic language model specialized in sentence similarity and semantic text embedding, achieving 82.78% accuracy on STS benchmarks
Brief-details: A 4-bit quantized version of Meta's Llama-3.2-3B-Instruct model, optimized for MLX framework with multi-language support and 502M parameters
Brief Details: Distilled Vision Transformer model with 87M parameters, achieving 83.4% top-1 accuracy on ImageNet. Optimized for efficient image classification using teacher-student learning.
Brief Details: RAG-Token model for knowledge-intensive NLP tasks. Built by Facebook, combines retrieval and generation for QA tasks. Based on DPR and BART architectures.
BRIEF DETAILS: A fine-tuned MiniLM model for content safety detection, supporting 14 classification categories with 33.4M parameters. Optimized for AI safety applications.
Brief Details: Llama3.1-8B-Chinese-Chat: An 8.03B parameter bilingual LLM fine-tuned from Meta-Llama-3.1-8B-Instruct, optimized for Chinese/English tasks using ORPO algorithm.
Brief Details: HiFiGAN vocoder for text-to-speech synthesis, trained on LJSpeech dataset. Converts spectrograms to high-quality 22.05kHz audio waveforms.
Brief-details: CogVLM is a 17.6B parameter visual language model achieving SOTA on 10 benchmarks, featuring 10B vision and 7B language parameters with advanced visual-text capabilities.
Brief Details: A small-sized pre-trained encoder-decoder model by Salesforce for code understanding and generation, with identifier-aware capabilities and multi-task support.
Brief Details: Qwen1.5-7B-Chat is a powerful 7.72B parameter chat model from the Qwen1.5 series, featuring 32K context length and improved multilingual capabilities.