Brief-details: Facebook's Arabic text-to-speech model using VITS architecture. 36.3M params, supporting end-to-end speech synthesis. Part of Massively Multilingual Speech project.
Brief-details: Add-Detail-XL is an AI model by PvDeep with unknown architecture and specifications, currently in experimental phase with 21 likes and environmental impact considerations.
Brief Details: WavLM-based emotion diarization model achieving 29.7% EDER, trained on 6 datasets for detecting and timing emotional segments in speech
Brief Details: SALMONN is a groundbreaking LLM that enables speech, audio, and music processing capabilities, developed by Tsinghua University and ByteDance. Available in 7B and 13B versions.
Brief-details: BLIP-2 image-to-text model using OPT-2.7b LLM, capable of image captioning and visual QA. Features frozen image encoders for efficient vision-language tasks.
Brief-details: TANGO - A state-of-the-art text-to-audio generation model using latent diffusion and Flan-T5 encoder, capable of producing realistic sounds from text prompts.
Brief-details: TechGPT-7B is a domain-specialized LLM focused on technical fields, featuring enhanced capabilities in knowledge graph construction, reading comprehension, and text analysis tasks.
Brief-details: FastText word vector model for Bulgarian language, trained on Common Crawl and Wikipedia data, offering 300-dimensional word embeddings with character n-grams.
Brief Details: NVIDIA FastPitch - A parallel transformer-based TTS model with 45M params, offering prosody control and English speech synthesis at 22050Hz
Brief details: A Japanese DistilBERT model pre-trained on 131GB web text, featuring 6 layers, 768 hidden dimensions, and 66M parameters. Achieves strong JGLUE benchmark performance.
Brief-details: A comprehensive LoRA model collection focusing on artistic styles, expressions, and regional characteristics, featuring both SFW and NSFW content with Indonesian language support
Brief Details: LSTM-based weather forecasting model using Jena Climate dataset. Predicts temperature 12 hours ahead using 120 hours of historical data.
Brief-details: Large English language model from spaCy with extensive word vectors (514K), high accuracy (97.35% TAG), and comprehensive NLP pipeline for CPU usage
Brief Details: Russian T5-based abstractive summarization model with 244M params, fine-tuned on 4 datasets, supporting configurable summary length and compression ratios.
Brief Details: BERT-based emotion classification model with 109M params, achieving 92.65% accuracy. Specializes in 6 emotion categories with high F1 scores.
Brief Details: Text-to-image SDXL v-prediction model trained on Danbooru/e621 datasets. Optimized for artistic generation with specific parameter requirements.
Brief-details: InternVL2 is a powerful multimodal model series (1B-108B parameters) for vision-language tasks, featuring 8k context window and advanced capabilities in document understanding, OCR, and scientific problem-solving.
Brief Details: UniFormer - A powerful vision transformer achieving 86.3% ImageNet accuracy by unifying convolution and self-attention for visual recognition tasks
Brief-details: NaturalLM-GGUF is a 12.2B parameter quantized language model based on Mistral architecture, optimized for efficient text generation and inference.
BRIEF DETAILS: A 7B parameter GGUF quantized model based on Mistral-7B-Instruct-v0.2, fine-tuned on Gutenberg datasets using ORPO technique with focus on literature and conversational abilities.
Brief Details: Freeze-Omni is an Apache-2.0 licensed AI model from VITA-MLLM with comprehensive usage policies and ethical guidelines for safe deployment.