Models

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

detr-resnet-50

Brief Details: DETR-ResNet-50 is a 41.6M parameter transformer-based object detection model achieving 42.0 AP on COCO, combining CNN backbone with end-to-end detection approach.

Object Detection

IDEA-Research

grounding-dino-tiny

Brief-details: Lightweight zero-shot object detection model (172M params) based on DINO architecture. Enables open-set detection through text prompts, reaching 52.5 AP on COCO.

Zero-Shot Object Detection

sentence-transformers

distiluse-base-multilingual-cased-v2

Brief Details: Multilingual sentence embedding model supporting 50+ languages, using DistilBERT architecture with 135M parameters for semantic similarity tasks.

Sentence Similarity

facebook

dinov2-large

Brief Details: DINOv2-large: A 304M parameter Vision Transformer model for self-supervised image feature extraction, developed by Facebook for robust visual understanding.

Image Feature Extraction

patrickvonplaten

wavlm-libri-clean-100h-base-plus

Brief-details: WavLM speech recognition model fine-tuned on LibriSpeech ASR dataset, achieving 6.83% WER. Features multi-GPU training and linear learning rate scheduling with Native AMP.

Automatic Speech Recognition

TheBloke

Claire-7B-0.1-GPTQ

Brief-details: Claire-7B-0.1-GPTQ is a French-focused 7B parameter LLM optimized for conversational tasks, featuring 4-bit and 8-bit quantization options.

Text Generation

nvidia

dragon-multiturn-query-encoder

Brief-details: Dragon multi-turn query encoder for conversational QA, optimized for dialogue history processing with state-of-the-art retrieval performance.

Feature Extraction

nvidia

dragon-multiturn-context-encoder

Brief Details: A specialized retriever model for conversational QA, built on Dragon architecture. Handles multi-turn dialogue with state-of-the-art context encoding.

Feature Extraction

naver

splade-cocondenser-selfdistil

Brief-details: SPLADE passage retrieval model achieving 37.6 MRR@10 on MS MARCO, optimized through self-distillation and CoCondenser architecture for efficient information retrieval.

Fill-Mask

google

owlvit-base-patch32

Brief Details: OWL-ViT is a zero-shot text-conditioned object detection model with 153M parameters, using CLIP backbone for multi-modal detection capabilities.

Zero-Shot Object Detection

microsoft

Florence-2-large

Brief Details: Advanced vision foundation model from Microsoft (0.77B params) capable of multi-task vision processing including captioning, detection, and OCR.

Image-Text-to-Text

csebuetnlp

banglat5_banglaparaphrase

Brief Details: BanglaT5 model fine-tuned for Bengali paraphrase generation, achieving 32.8 BLEU score. Specialized for text-to-text tasks with span corruption.

Text2Text Generation

MoritzLaurer

DeBERTa-v3-base-mnli-fever-anli

Brief-details: A powerful DeBERTa-v3 model fine-tuned on NLI datasets (MNLI, FEVER, ANLI) with 184M parameters, achieving strong zero-shot classification performance.

Zero-Shot Classification

BridgeTower

bridgetower-large-itm-mlm-itc

Brief-details: BridgeTower vision-language model with state-of-the-art performance on VQAv2. Features innovative bridge layers for cross-modal alignment. MIT licensed.

Transformers

ZhengPeng7

BiRefNet

Brief-details: BiRefNet is a 221M parameter image segmentation model specializing in high-resolution dichotomous segmentation, with superior performance in background removal and mask generation.

Image Segmentation

openai

whisper-base

Brief Details: Whisper-base is a 74M parameter speech recognition model trained on 680k hours of data, supporting 99 languages with strong transcription capabilities.

Automatic Speech Recognition

facebook

esm2_t6_8M_UR50D

Brief-details: ESM-2 lightweight protein language model (8M params) for protein sequence analysis - efficient transformer with 6 layers for masked language modeling

Fill-Mask

microsoft

table-transformer-structure-recognition

Brief Details: Microsoft's DETR-based table structure recognition model with 28.8M params, trained on PubTables1M for extracting table structures from documents.

Object Detection

deepset

roberta-base-squad2-distilled

Brief Details: RoBERTa-based QA model (124M params) trained on SQuAD 2.0, achieving 80.86% exact match. Distilled from larger model for efficiency.

Question Answering

obi

deid_roberta_i2b2

Brief-details: RoBERTa-based model for de-identifying medical notes, trained on I2B2 dataset. Specializes in detecting 11 PHI types with BILOU tagging. MIT licensed, highly downloaded.

Token Classification

thenlper

gte-base

Brief-details: GTE-base is a 109M parameter text embedding model optimized for semantic similarity tasks, achieving strong MTEB benchmark performance with efficient compute requirements and 768-dimensional embeddings.