Models

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

mms-tts-ara

Brief-details: Facebook's Arabic text-to-speech model using VITS architecture. 36.3M params, supporting end-to-end speech synthesis. Part of Massively Multilingual Speech project.

Text-to-Speech

PvDeep

Add-Detail-XL

Brief-details: Add-Detail-XL is an AI model by PvDeep with unknown architecture and specifications, currently in experimental phase with 21 likes and environmental impact considerations.

Text Generation

speechbrain

emotion-diarization-wavlm-large

Brief Details: WavLM-based emotion diarization model achieving 29.7% EDER, trained on 6 datasets for detecting and timing emotional segments in speech

Audio Classification

tsinghua-ee

SALMONN

Brief Details: SALMONN is a groundbreaking LLM that enables speech, audio, and music processing capabilities, developed by Tsinghua University and ByteDance. Available in 7B and 13B versions.

Automatic Speech Recognition

paragon-AI

blip2-image-to-text

Brief-details: BLIP-2 image-to-text model using OPT-2.7b LLM, capable of image captioning and visual QA. Features frozen image encoders for efficient vision-language tasks.

Image-to-Text

declare-lab

tango

Brief-details: TANGO - A state-of-the-art text-to-audio generation model using latent diffusion and Flan-T5 encoder, capable of producing realistic sounds from text prompts.

Text-to-Audio

neukg

TechGPT-7B

Brief-details: TechGPT-7B is a domain-specialized LLM focused on technical fields, featuring enhanced capabilities in knowledge graph construction, reading comprehension, and text analysis tasks.

Text2Text Generation

facebook

fasttext-bg-vectors

Brief-details: FastText word vector model for Bulgarian language, trained on Common Crawl and Wikipedia data, offering 300-dimensional word embeddings with character n-grams.

Feature Extraction

nvidia

tts_en_fastpitch

Brief Details: NVIDIA FastPitch - A parallel transformer-based TTS model with 45M params, offering prosody control and English speech synthesis at 22050Hz

Text-to-Speech

line-corporation

line-distilbert-base-japanese

Brief details: A Japanese DistilBERT model pre-trained on 131GB web text, featuring 6 layers, 768 hidden dimensions, and 66M parameters. Achieves strong JGLUE benchmark performance.

Fill-Mask

tt-doang69

loras

Brief-details: A comprehensive LoRA model collection focusing on artistic styles, expressions, and regional characteristics, featuring both SFW and NSFW content with Indonesian language support

Indonesian

keras-io

timeseries_forecasting_for_weather

Brief Details: LSTM-based weather forecasting model using Jena Climate dataset. Predicts temperature 12 hours ahead using 120 hours of historical data.

Time Series Forecasting

spacy

en_core_web_lg

Brief-details: Large English language model from spaCy with extensive word vectors (514K), high accuracy (97.35% TAG), and comprehensive NLP pipeline for CPU usage

Token Classification

cointegrated

rut5-base-absum

Brief Details: Russian T5-based abstractive summarization model with 244M params, fine-tuned on 4 datasets, supporting configurable summary length and compression ratios.

Summarization

bhadresh-savani

bert-base-uncased-emotion

Brief Details: BERT-based emotion classification model with 109M params, achieving 92.65% accuracy. Specializes in 6 emotion categories with high F1 scores.

Text Classification

Laxhar

noobai-XL-Vpred-0.65

Brief Details: Text-to-image SDXL v-prediction model trained on Danbooru/e621 datasets. Optimized for artistic generation with specific parameter requirements.

Text-to-Image

OpenGVLab

InternVL2-Pretrain-Models

Brief-details: InternVL2 is a powerful multimodal model series (1B-108B parameters) for vision-language tasks, featuring 8k context window and advanced capabilities in document understanding, OCR, and scientific problem-solving.

Image-Text-to-Text

Sense-X

uniformer_image

Brief Details: UniFormer - A powerful vision transformer achieving 86.3% ImageNet accuracy by unifying convolution and self-attention for visual recognition tasks

Image Classification

QuantFactory

NaturalLM-GGUF

Brief-details: NaturalLM-GGUF is a 12.2B parameter quantized language model based on Mistral architecture, optimized for efficient text generation and inference.

Transformers