Brief-details: T5-based text summarization model with 60.5M parameters. Optimized for generating concise summaries with F32 precision. Apache 2.0 licensed.
BRIEF-DETAILS: Donut-base is a pre-trained document understanding transformer that combines Swin Transformer vision encoding with BART text decoding for OCR-free document processing.
Brief Details: 4-bit quantized version of Meta's Llama-3.2-3B model optimized for efficiency. Features 1.85B parameters, multi-language support, and Unsloth acceleration.
Brief Details: A powerful 11B parameter multimodal vision-language model from Meta's Llama 3.2 family, offering enhanced vision-text capabilities with optimized memory usage.
Brief Details: A fine-tuned DistilRoBERTa model for detecting offensive and hateful speech, achieving 94.50% accuracy with MIT license support.
Brief-details: Large-scale document AI model from Microsoft with 343M parameters, optimized for text + layout understanding in documents, trained on 11M docs.
Brief-details: TAPAS base is a BERT-like transformer specialized in table reasoning, pre-trained on Wikipedia data with MLM and intermediate pre-training for numerical reasoning.
Brief-details: TANGO-full is a state-of-the-art text-to-audio generation model using latent diffusion and Flan-T5 LLM, capable of producing realistic sounds from textual prompts.
Brief-details: Fast English phrase chunking model using Flair embeddings & LSTM-CRF architecture. Achieves 96.22% F1-score on CoNLL-2000, identifies 10 chunk types.
Brief-details: Italian sentence similarity model based on BERTino, optimized for embedding sentences into 768-dimensional vectors with 67.6M parameters and MIT license.
Brief-details: Meta's Llama 3.2 11B vision-language model optimized for 4-bit inference. Features multimodal capabilities with 6.05B parameters, supporting English and visual understanding.
Brief Details: A powerful CLIP vision-language model with 428M parameters, trained on LAION-2B dataset. Achieves 75.3% zero-shot accuracy on ImageNet-1k.
Brief-details: A specialized LoRA adapter for Animagine XL 2.0 that enhances anime-style image generation with improved quality and detail, supporting Danbooru tags and high-resolution outputs.
Brief Details: Indonesian emotion prediction model using IndoBERT, capable of classifying 6 emotions (anger, sadness, happiness, love, fear, disgust) in text
Brief Details: SegFormer b0 encoder for image classification, pre-trained on ImageNet-1k. Lightweight transformer architecture by NVIDIA with 30 likes and 43k+ downloads.
Brief-details: A Hindi to English neural machine translation model by Helsinki-NLP, achieving BLEU scores up to 40.4 on Tatoeba dataset, built on transformer architecture.
Brief Details: LLaVA-Video-7B-Qwen2 is an 8.03B parameter multimodal model for video understanding, supporting up to 64 frames with strong performance across multiple video-text benchmarks.
Brief Details: ESM-1b: A 650M parameter transformer protein language model by Facebook, trained on Uniref50 for protein sequence prediction and analysis. MIT licensed, highly downloaded.
Brief-details: Neural machine translation model for Indonesian to English conversion, based on Marian transformer architecture with 47.7 BLEU score on Tatoeba test set
Brief Details: BART-based text summarization model with 406M parameters, fine-tuned on xsum dataset. Optimized for generating concise, accurate summaries of longer texts.
Brief-details: Norwegian large-scale speech recognition model (1.54B params) fine-tuned from Whisper, optimized for Norwegian languages with 20k hours of training data