Models

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

cogvlm2-llama3-chat-19B

Brief Details: CogVLM2 is a powerful 19B parameter multimodal model supporting 8K text length and 1344x1344 image resolution, built on LLaMA-3 for English chat & vision tasks.

Text Generation

Qwen

Qwen2.5-32B-Instruct

Brief-details: Latest 32B parameter instruction-tuned LLM from Qwen featuring 128K context length, multi-language support, and enhanced capabilities in coding, math, and long-text generation.

Text Generation

flair

upos-english

Brief-details: English Universal Part-of-Speech tagger using Flair embeddings and LSTM-CRF architecture. Achieves 98.6% F1-score on Ontonotes dataset.

Token Classification

Qwen

Qwen2-0.5B-Instruct

Brief Details: Qwen2-0.5B-Instruct is a compact 494M parameter instruction-tuned language model offering improved performance over its predecessor with enhanced capabilities in reasoning and generation tasks.

Text Generation

microsoft

BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

Brief-details: BiomedCLIP model for biomedical image-text processing, trained on 15M figure-caption pairs from PubMed Central. Combines PubMedBERT and ViT.

Zero-Shot Image Classification

codesage

codesage-large

BRIEF-DETAILS: CodeSage-Large is a 1.3B parameter code embedding model trained on Stack data, supporting 9 programming languages with 2048-dimensional embeddings.

Transformers

Intel

dpt-large

Brief Details: DPT-Large is a 342M-parameter Vision Transformer model for monocular depth estimation, trained on 1.4M images with state-of-the-art zero-shot transfer capabilities.

Depth Estimation

SG161222

Realistic_Vision_V5.1_noVAE

BRIEF-DETAILS: Advanced text-to-image model focused on realistic image generation, featuring specialized VAE integration and optimized negative prompting system for high-quality outputs.

Text-to-Image

flair

pos-english

Brief-details: English Part-of-Speech tagger using Flair embeddings and LSTM-CRF architecture. Achieves 98.19% F1-score on Ontonotes dataset. Supports 36 POS tags.

Token Classification

xlnet

xlnet-base-cased

Brief-details: XLNet base-cased model - A powerful pre-trained transformer model for language understanding tasks, trained on BookCorpus and Wikipedia, featuring permutation-based learning.

Text Generation

MaziyarPanahi

calme-3.1-baguette-3b-GGUF

BRIEF-DETAILS: A 3.09B parameter GGUF-formatted language model optimized for text generation with multiple quantization options (2-8 bit precision)

Text Generation

reazon-research

reazonspeech-nemo-v2

Brief Details: Japanese speech recognition model with 619M parameters using Fast Conformer architecture. Supports long-form audio and achieves high accuracy with Longformer attention.

Automatic Speech Recognition

nvidia

segformer-b0-finetuned-ade-512-512

Brief-details: Efficient semantic segmentation model with 3.75M params, fine-tuned on ADE20k dataset. Uses hierarchical Transformer encoder and MLP decoder for image segmentation tasks.

Image Segmentation

SG161222

Realistic_Vision_V2.0

BRIEF DETAILS: Advanced text-to-image model optimized for photorealistic generation with 166K+ downloads. Features detailed skin rendering and high-quality photo simulation.

Text-to-Image

alger-ia

dziribert

BRIEF DETAILS: DziriBERT is a pioneering 124M-parameter BERT model for Algerian dialect, supporting both Arabic and Latin scripts, trained on ~1M tweets.

Fill-Mask

openlm-research

open_llama_3b

BRIEF-DETAILS: Open-source 3B parameter LLaMA reproduction trained on RedPajama-1T dataset. Apache 2.0 licensed with strong performance across NLP tasks.

Text Generation

ByteDance

SDXL-Lightning

Brief Details: Lightning-fast SDXL model by ByteDance capable of generating 1024px images in 1-8 steps, with various checkpoint options including UNet and LoRA variants

Text-to-Image

aspire

acge_text_embedding

Brief-details: A Chinese text embedding model (326M params) using Matryoshka Representation Learning, offering flexible embedding dimensions (1024/1792) with strong performance on C-MTEB benchmark (69.07% average score).

Sentence Similarity

timm

vit_base_r50_s16_384.orig_in21k_ft_in1k

Brief-details: ResNet-ViT hybrid model with 99M params, trained on ImageNet-21k & fine-tuned on ImageNet-1k. Optimized for 384x384 images, ideal for high-res classification.

Image Classification

laion

CLIP-ViT-L-14-DataComp.XL-s13B-b90K

BRIEF DETAILS: Advanced CLIP model trained on DataComp-1B dataset, achieving 79.2% zero-shot accuracy on ImageNet-1k. Optimized for research and zero-shot image classification.

Zero-Shot Image Classification

THUDM

CogVideoX-5b

Brief-details: CogVideoX-5b is a 5B parameter text-to-video generation model supporting high-quality video synthesis with BF16 precision and optimized VRAM usage starting from 5GB.

Text-to-Video

cogvlm2-llama3-chat-19B

Qwen2.5-32B-Instruct

upos-english

Qwen2-0.5B-Instruct

BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

codesage-large

dpt-large

Realistic_Vision_V5.1_noVAE

pos-english

xlnet-base-cased

calme-3.1-baguette-3b-GGUF

reazonspeech-nemo-v2

segformer-b0-finetuned-ade-512-512

Realistic_Vision_V2.0

dziribert

open_llama_3b

SDXL-Lightning

acge_text_embedding

vit_base_r50_s16_384.orig_in21k_ft_in1k

CLIP-ViT-L-14-DataComp.XL-s13B-b90K

CogVideoX-5b

The first platform built for prompt engineering