Models

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

SER-Odyssey-Baseline-WavLM-Multi-Attributes

Brief-details: Speech Emotion Recognition model using WavLM architecture that predicts arousal, dominance, and valence from audio with 319M parameters. Trained on MSP-Podcast dataset.

Audio Classification

NbAiLab

nb-wav2vec2-300m-nynorsk

Brief-details: Norwegian ASR model with 315M parameters, fine-tuned on VoxRex for Nynorsk language recognition. Achieves 12.22% WER with KenLM integration.

Automatic Speech Recognition

sentence-transformers

roberta-base-nli-stsb-mean-tokens

Brief Details: Deprecated RoBERTa-based sentence embedding model (768-dim vectors) for similarity tasks. 125M params. Not recommended for new projects.

Sentence Similarity

Efficient-Large-Model

VILA1.5-3b-s2

BRIEF-DETAILS: VILA1.5-3b-s2 is a visual language model enabling multi-image understanding and reasoning, with edge deployment capability through 4-bit quantization, built on interleaved image-text training.

Text Generation

google

siglip-large-patch16-256

Brief-details: Large vision-language model (652M params) using sigmoid loss for improved image-text learning. Excellent for zero-shot classification and retrieval tasks.

Zero-Shot Image Classification

MaziyarPanahi

WizardLM-2-8x22B-GGUF

Brief-details: Quantized version of WizardLM 141B parameter model, optimized for efficient inference with multiple precision options (2-8 bit). Created by Microsoft, converted to GGUF format.

Text Generation

M-CLIP

XLM-Roberta-Large-Vit-B-16Plus

Brief Details: A powerful multilingual CLIP model supporting 48 languages, optimized for text-image understanding with state-of-the-art R@10 retrieval performance across multiple languages.

Transformers

timm

regnety_002.pycls_in1k

Brief-details: A lightweight RegNetY model with 3.18M parameters optimized for image classification, offering an efficient balance of performance and size.

Image Classification

llava-hf

llava-1.5-13b-hf

Brief Details: LLaVA 1.5 13B is a powerful multimodal vision-language model with 13.4B parameters, capable of understanding and discussing images in natural conversations.

Image-Text-to-Text

Intel

neural-chat-7b-v3-3

Brief Details: Intel's 7B parameter LLM optimized for math and reasoning, achieving 69.83 avg score on LLM leaderboard with strong performance in HellaSwag (85.26%) and Winogrande (79.64%)

Text Generation

sentence-transformers

nli-distilroberta-base-v2

Brief-details: A 768-dimensional sentence embedding model based on DistilRoBERTa, optimized for NLI tasks with 82.1M parameters. Efficiently maps sentences to dense vector space.

Sentence Similarity

EleutherAI

polyglot-ko-1.3b

Brief-details: A 1.3B parameter Korean language model trained on 863GB of Korean text, optimized for text generation with strong performance on Korean NLP tasks.

Text Generation

depth-anything

Depth-Anything-V2-Small

BRIEF DETAILS: State-of-the-art monocular depth estimation model offering 10x faster performance, trained on 595K synthetic + 62M real images with Apache 2.0 license.

Depth Estimation

Yntec

HyperRemix

Brief Details: A hyperrealistic text-to-image model combining HyperRealism 1.2 and DreamPhotoGASM, specialized in analog-style photography and detailed portraits.

Text-to-Image

jtlicardo

bpmn-information-extraction-v2

Brief Details: BERT-based model for business process text analysis with 108M parameters. Achieves 90.31% F1 score for extracting process elements from text.

Token Classification

bartowski

Qwen2.5.1-Coder-7B-Instruct-GGUF

Brief-details: A specialized 7B parameter code-focused LLM with multiple quantization options (2.7GB-15GB), optimized for coding tasks using Qwen architecture and GGUF format

Text Generation

depth-anything

Depth-Anything-V2-Base

BRIEF DETAILS: State-of-the-art monocular depth estimation model trained on 595K synthetic + 62M real images, offering 10x faster performance than SD-based alternatives.

Depth Estimation

Qwen

Qwen2.5-3B

BRIEF DETAILS: A 3B parameter language model from Qwen's 2.5 series with 32K context window, optimized for text generation and coding tasks with multilingual capabilities across 29+ languages.

Text Generation

MaziyarPanahi

Mixtral-8x22B-Instruct-v0.1-GGUF

Brief-details: Mixtral-8x22B-Instruct-v0.1-GGUF is a powerful 141B parameter language model supporting 5 languages with various quantization options (2-16 bit) and MoE architecture.

Text Generation

saattrupdan

wav2vec2-xls-r-300m-ftspeech

Brief-details: Danish speech recognition model fine-tuned on FTSpeech dataset (1,800hrs), achieving 17.91% WER on Common Voice. Based on XLS-R-300m with 315M parameters.

Automatic Speech Recognition

jinaai

jina-clip-v1

BRIEF-DETAILS: A state-of-the-art multimodal embedding model (223M params) that excels in both text-to-text and text-to-image retrieval tasks, bridging CLIP and text embedding capabilities.

Feature Extraction

SER-Odyssey-Baseline-WavLM-Multi-Attributes

nb-wav2vec2-300m-nynorsk

roberta-base-nli-stsb-mean-tokens

VILA1.5-3b-s2

siglip-large-patch16-256

WizardLM-2-8x22B-GGUF

XLM-Roberta-Large-Vit-B-16Plus

regnety_002.pycls_in1k

llava-1.5-13b-hf

neural-chat-7b-v3-3

nli-distilroberta-base-v2

polyglot-ko-1.3b

Depth-Anything-V2-Small

HyperRemix

bpmn-information-extraction-v2

Qwen2.5.1-Coder-7B-Instruct-GGUF

Depth-Anything-V2-Base

Qwen2.5-3B

Mixtral-8x22B-Instruct-v0.1-GGUF

wav2vec2-xls-r-300m-ftspeech

jina-clip-v1

The first platform built for prompt engineering