Brief Details: Fashion-CLIP: A 151M parameter vision-language model fine-tuned on 800K fashion products for zero-shot classification and general fashion concept understanding.
Brief-details: NVIDIA's TitaNet-Large speaker verification model with 23M parameters, trained on 6 datasets. Achieves 0.66% EER on VoxCeleb1 for speaker verification.
Brief-details: BERT-based personality prediction model with 109M parameters, trained to analyze text and predict Big Five personality traits with MIT license.
Brief-details: BLIP large-scale image captioning model (470M params) by Salesforce. Excels at both conditional and unconditional image captioning with state-of-the-art performance.
Brief-details: FNet-base: A Transformer variant replacing attention with Fourier transforms, trained on C4 dataset for MLM and NSP tasks. Achieves 93% of BERT's performance at 32% faster training.
Brief Details: Multilingual sentence embedding model supporting 50+ languages, 278M parameters, maps text to 768D vectors for semantic search & clustering
**Brief Details**: 8B parameter RP-focused Llama3-based model created via Model Stock merging of 12 fine-tuned models, optimized for text generation in FP16 format.
Brief-details: Specialized model focused on NSFW activator triggers and poses. Contains multiple detailed activator prompts for various scenarios. Created by SecondDinner.
Brief-details: GatorTron-base: A 345M-parameter clinical language model trained on 82B+ words of medical data, developed by UF and NVIDIA for healthcare NLP tasks.
Brief-details: F5-TTS is a cutting-edge text-to-speech model using flow matching, focused on producing fluent and faithful speech, licensed under CC-BY-NC-4.0
BRIEF DETAILS: Deprecated sentence embedding model with 66.4M params that maps text to 768-dim vectors. Built on DistilBERT, but not recommended due to low quality.
Brief-details: Unsupervised dense information retrieval model by Facebook using contrastive learning, with 821K+ downloads and strong text embedding capabilities.
Brief Details: A multilingual sentiment analysis model supporting 12 languages, distilled from mDeBERTa-v3. 135M parameters, high accuracy (88.29% teacher agreement).
Brief-details: ResNet-18 A1 model with 11.7M parameters, trained on ImageNet-1k using LAMB optimizer and BCE loss. Achieves 71.49% top-1 accuracy.
Brief-details: Cutting-edge diffusion model for surface normals estimation, built on Stable Diffusion. Features LCM for fast processing and zero-shot capabilities.
Brief-details: LayoutLM base model (113M params) for document AI, combining text and layout pre-training. Microsoft-developed, MIT licensed, with 2.4M+ downloads.
Brief Details: A powerful forced alignment model supporting 158 languages, based on MMS-300M architecture with 315M parameters for precise audio-text synchronization.
BRIEF-DETAILS: Japanese BERT base model trained on Wikipedia, using IPA dictionary-based word tokenization. Features 12 layers, 768-dim hidden states, 32k vocab size.
Brief Details: Dutch speech recognition model based on XLSR-53, achieving 15.72% WER on Common Voice. Optimized for 16kHz audio with language model support.
Brief-details: T5-base is a versatile 223M parameter text-to-text transformer model capable of NLP tasks like translation, summarization, and question answering across multiple languages.
Brief-details: Specialized RoBERTa model fine-tuned for sentiment analysis of central bank communications, achieving 88% accuracy in classifying positive/negative sentiments.