Brief Details: A video-language understanding model with 197M parameters, achieving 80.4% top-1 accuracy on Kinetics-400. Built on CLIP architecture for video classification.
Brief-details: BLIP-2 vision-language model with 3.74B parameters using OPT-2.7b LLM. Excels at image captioning and visual QA tasks. MIT licensed.
Brief-details: High-performance text embedding model with 109M parameters, optimized for retrieval tasks. Achieves SOTA performance with MTEB score of 54.90 NDCG@10.
Brief-details: A compact 1.1B parameter chat model based on Llama architecture, trained on 3T tokens. Optimized for efficiency while maintaining quality conversational abilities.
Brief Details: SAM ViT-Huge is Facebook's advanced vision transformer model for semantic segmentation, featuring 641M parameters and zero-shot mask generation capabilities.
BRIEF DETAILS: Portuguese sentiment analysis model (109M params) trained on Sebrae RS dataset. High accuracy (96.5%) for 3-class classification. Low carbon footprint.
Brief-details: Microsoft's DeBERTa large model fine-tuned on MNLI, featuring disentangled attention mechanism and enhanced mask decoder for superior NLU performance.
Brief Details: ViViT transformer model trained on Kinetics-400 for video classification, extending Vision Transformer architecture to process video data efficiently
Brief-details: A powerful text-to-image diffusion model built on SDXL, specialized in photorealistic generation across diverse domains including cinematics, landscapes, and architecture.
Brief Details: A specialized BERT model pretrained on PubMed abstracts for biomedical NLP tasks, achieving SOTA performance in domain-specific applications
Brief-details: A 7B parameter chat model fine-tuned from Mistral-7B, achieving strong performance on MT-Bench (7.34) and AlpacaEval (90.60% win rate), trained using Direct Preference Optimization.
Brief Details: A compact sentence embedding model with 17.4M parameters, mapping text to 384-dimensional vectors for semantic search and similarity tasks.
Brief-details: OneFormer ADE20K Swin-Tiny is a universal image segmentation model that performs semantic, instance, and panoptic segmentation using a single transformer-based architecture.
Brief-details: AlbedoBase is a powerful text-to-image diffusion model using StableDiffusionXL pipeline, with over 423K downloads and specialized for high-quality image generation.
Brief-details: ResNet34 A1 model with 21.8M params, trained on ImageNet-1k using LAMB optimizer and cosine LR schedule. Achieves 77.92% top-1 accuracy.
BRIEF DETAILS: Popular English-to-French translation model from Helsinki-NLP with strong BLEU scores (33-50 on various test sets). Built on Marian architecture with 430K+ downloads.
Brief Details: HuBERT Large model fine-tuned on LibriSpeech 960h, achieving 1.9 WER on clean test sets. Specialized for 16kHz speech recognition.
Brief-details: A quantized version of Mistral-7B-OpenOrca optimized for GPU inference, featuring 4-bit and 8-bit variants with GPTQ compression and ChatML format support
Brief Details: A specialized BERT model pretrained on biomedical texts (PubMed abstracts + full articles), optimized for biomedical NLP tasks with SOTA performance.
Brief-details: A custom-trained version of Stable Diffusion 2.1 base model, optimized using Custom Diffusion technique. Features 438K+ downloads and supports text-to-image generation with specialized adaptations.
Brief Details: A powerful anime-focused SDXL model fine-tuned on stable-diffusion-xl-base-1.0, optimized for artistic anime-style image generation with enhanced detail and aesthetics.