Brief Details: DETR-ResNet-50 is a 41.6M parameter transformer-based object detection model achieving 42.0 AP on COCO, combining CNN backbone with end-to-end detection approach.
Brief-details: Lightweight zero-shot object detection model (172M params) based on DINO architecture. Enables open-set detection through text prompts, reaching 52.5 AP on COCO.
Brief Details: Multilingual sentence embedding model supporting 50+ languages, using DistilBERT architecture with 135M parameters for semantic similarity tasks.
Brief Details: DINOv2-large: A 304M parameter Vision Transformer model for self-supervised image feature extraction, developed by Facebook for robust visual understanding.
Brief-details: WavLM speech recognition model fine-tuned on LibriSpeech ASR dataset, achieving 6.83% WER. Features multi-GPU training and linear learning rate scheduling with Native AMP.
Brief-details: Claire-7B-0.1-GPTQ is a French-focused 7B parameter LLM optimized for conversational tasks, featuring 4-bit and 8-bit quantization options.
Brief-details: Dragon multi-turn query encoder for conversational QA, optimized for dialogue history processing with state-of-the-art retrieval performance.
Brief Details: A specialized retriever model for conversational QA, built on Dragon architecture. Handles multi-turn dialogue with state-of-the-art context encoding.
Brief-details: SPLADE passage retrieval model achieving 37.6 MRR@10 on MS MARCO, optimized through self-distillation and CoCondenser architecture for efficient information retrieval.
Brief Details: OWL-ViT is a zero-shot text-conditioned object detection model with 153M parameters, using CLIP backbone for multi-modal detection capabilities.
Brief Details: Advanced vision foundation model from Microsoft (0.77B params) capable of multi-task vision processing including captioning, detection, and OCR.
Brief Details: BanglaT5 model fine-tuned for Bengali paraphrase generation, achieving 32.8 BLEU score. Specialized for text-to-text tasks with span corruption.
Brief-details: A powerful DeBERTa-v3 model fine-tuned on NLI datasets (MNLI, FEVER, ANLI) with 184M parameters, achieving strong zero-shot classification performance.
Brief-details: BridgeTower vision-language model with state-of-the-art performance on VQAv2. Features innovative bridge layers for cross-modal alignment. MIT licensed.
Brief-details: BiRefNet is a 221M parameter image segmentation model specializing in high-resolution dichotomous segmentation, with superior performance in background removal and mask generation.
Brief Details: Whisper-base is a 74M parameter speech recognition model trained on 680k hours of data, supporting 99 languages with strong transcription capabilities.
Brief-details: ESM-2 lightweight protein language model (8M params) for protein sequence analysis - efficient transformer with 6 layers for masked language modeling
Brief Details: Microsoft's DETR-based table structure recognition model with 28.8M params, trained on PubTables1M for extracting table structures from documents.
Brief Details: RoBERTa-based QA model (124M params) trained on SQuAD 2.0, achieving 80.86% exact match. Distilled from larger model for efficiency.
Brief-details: RoBERTa-based model for de-identifying medical notes, trained on I2B2 dataset. Specializes in detecting 11 PHI types with BILOU tagging. MIT licensed, highly downloaded.
Brief-details: GTE-base is a 109M parameter text embedding model optimized for semantic similarity tasks, achieving strong MTEB benchmark performance with efficient compute requirements and 768-dimensional embeddings.