Models

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

MiniCPM-V-2_6-int4

Brief Details: 4.76B param int4-quantized vision-language model for image understanding and conversation. Optimized for low memory (7GB) with multilingual support.

Image-Text-to-Text

google

vit-large-patch16-224-in21k

Brief-details: Large Vision Transformer (ViT) model with 304M parameters, pre-trained on ImageNet-21k for image recognition tasks. Features 16x16 patch size and 224x224 resolution.

Image Feature Extraction

SimianLuo

LCM_Dreamshaper_v7

Brief Details: LCM_Dreamshaper_v7 is a fast text-to-image model distilled from Dreamshaper v7, capable of high-quality image generation in 4-8 steps using Latent Consistency.

Text-to-Image

naver

efficient-splade-VI-BT-large-query

BRIEF DETAILS: Efficient SPLADE query model for passage retrieval, achieving 38.0 MRR@10 on MS MARCO with fast 0.7ms inference latency. Part of dual query-doc architecture.

Fill-Mask

lmsys

vicuna-7b-v1.5-16k

Brief Details: A 7B parameter chat assistant based on Llama 2, fine-tuned on ShareGPT conversations with 16K context window capability

Text Generation

sonoisa

sentence-bert-base-ja-mean-tokens

Brief Details: Japanese Sentence-BERT model (111M params) optimized for sentence embeddings and similarity tasks, supporting efficient text encoding in Japanese language.

Feature Extraction

r1char9

rubert-base-cased-russian-sentiment

BRIEF-DETAILS: Russian BERT-based sentiment analysis model for 3-class text classification (positive/negative/neutral), trained on multiple Russian datasets

Text Classification

vinai

PhoWhisper-large

Brief-details: Vietnamese-focused automatic speech recognition model fine-tuned on 844 hours of diverse Vietnamese accents, based on multilingual Whisper architecture

Automatic Speech Recognition

apple

DFN2B-CLIP-ViT-L-14

Brief-details: DFN2B-CLIP is a powerful contrastive learning model trained on 2B filtered images from 12.8B pairs, achieving 81.4% accuracy on ImageNet

OpenCLIP

Helsinki-NLP

opus-mt-ja-en

Brief-details: A Japanese to English translation model by Helsinki-NLP, achieving 41.7 BLEU score on Tatoeba dataset, built on transformer-align architecture with SentencePiece preprocessing

Translation

EleutherAI

pythia-1.4b-deduped-v0

Brief-details: A 1.4B parameter language model trained on deduplicated Pile dataset, optimized for research and interpretability with 143 checkpoints available.

Text Generation

UCSC-VLAA

ViT-L-14-CLIPA-datacomp1B

Brief-details: A CLIPA-v2 vision-language model trained on datacomp1B dataset, achieving 81.1% zero-shot ImageNet accuracy, specialized for image-text understanding and classification tasks

Zero-Shot Image Classification

naver

efficient-splade-VI-BT-large-doc

Brief-details: Efficient document encoder for passage retrieval using SPLADE architecture, optimized for fast inference with competitive MRR@10 performance on MS MARCO dataset.

Fill-Mask

Yntec

beLIEve

Brief-details: A merged text-to-image model combining realisticStockPhoto v3 and ICantBelieveItSNotPhotography for enhanced photorealistic portraits with improved facial variety and details.

Text-to-Image

klue

bert-base

Brief-details: KLUE BERT base is a 111M-parameter Korean language model trained on 62GB of diverse Korean text, optimized for tasks like NER, NLI, and text classification.

Fill-Mask

comodoro

wav2vec2-xls-r-300m-cs-250

Brief-details: Czech speech recognition model fine-tuned on XLS-R 300M, achieving 7.3% WER on Common Voice test set. Optimized for 16kHz audio processing.

Automatic Speech Recognition

XLabs-AI

flux-controlnet-collections

Brief-details: A versatile ControlNet collection for FLUX.1-dev model offering Canny, HED, and Depth (Midas) variants trained at 1024x1024 resolution for enhanced image control

Text-to-Image

timm

inception_v3.tf_adv_in1k

Brief Details: Inception-v3 model adversarially trained on ImageNet-1k, featuring 23.9M parameters and 299x299 input size. Optimized for robust image classification.

Image Classification

laion

CLIP-convnext_base_w-laion2B-s13B-b82K

Brief Details: A powerful CLIP model using ConvNeXt-Base architecture trained on LAION-2B dataset, achieving 70.8% ImageNet zero-shot accuracy with efficient training approach.

Zero-Shot Image Classification

TheBloke

Mixtral-8x7B-Instruct-v0.1-GPTQ

Brief-details: Quantized version of Mixtral-8x7B-Instruct featuring multiple GPTQ variants (3-bit to 8-bit), optimized for efficient GPU inference with reduced VRAM usage

Text Generation

microsoft

wavlm-base-plus-sd

Brief-details: WavLM Base Plus model specialized for speaker diarization, trained on 94k hours of speech data. Features utterance mixing and gated relative position bias.

Transformers

MiniCPM-V-2_6-int4

vit-large-patch16-224-in21k

LCM_Dreamshaper_v7

efficient-splade-VI-BT-large-query

vicuna-7b-v1.5-16k

sentence-bert-base-ja-mean-tokens

rubert-base-cased-russian-sentiment

PhoWhisper-large

DFN2B-CLIP-ViT-L-14

opus-mt-ja-en

pythia-1.4b-deduped-v0

ViT-L-14-CLIPA-datacomp1B

efficient-splade-VI-BT-large-doc

beLIEve

bert-base

wav2vec2-xls-r-300m-cs-250

flux-controlnet-collections

inception_v3.tf_adv_in1k

CLIP-convnext_base_w-laion2B-s13B-b82K

Mixtral-8x7B-Instruct-v0.1-GPTQ

wavlm-base-plus-sd

The first platform built for prompt engineering