Brief-details: Swin Transformer base model with 87.8M parameters for image classification, using hierarchical vision transformer architecture with shifted windows for efficient processing.
Brief-details: A 1.7B parameter text-to-video diffusion model that generates videos from English text descriptions, utilizing a UNet3D architecture and multi-stage generation process.
Brief-details: InLegalBERT - A specialized BERT model pre-trained on 5.4M Indian legal documents, featuring 110M parameters and optimized for legal domain tasks.
Brief Details: ONNX-converted RoBERTa model for toxicity detection with bias mitigation, supporting multiple classification tasks and identity-aware analysis.
Brief-details: Highly efficient text-to-image model using cascade architecture with 42x compression factor, offering faster inference and cheaper training than traditional models like Stable Diffusion.
Brief Details: Vision Transformer model trained with DINO method, featuring 85.8M params for self-supervised image processing with 8x8 patch resolution. Apache-2.0 licensed.
Brief-details: MiniCPM3-4B is a powerful 4B parameter bilingual LLM that outperforms GPT-3.5-Turbo-0125 in several benchmarks, featuring 32k context window and function calling capabilities.
Brief Details: LLaVA-OneVision 0.5B multimodal model based on Qwen2, supporting English/Chinese image/video interaction with 894M parameters. Shows strong performance on visual tasks.
Brief-details: SAM2's tiny variant for image/video segmentation - offers efficient mask generation with lightweight architecture, Apache 2.0 licensed, 29K+ downloads
Brief-details: An 8B parameter Japanese-enhanced LLaMA 3.1 model, fine-tuned for instruction following with improved bilingual capabilities and strong performance on Japanese NLP tasks
Brief-details: A powerful ColBERT reranking model with 335M parameters, built on mixedbread-ai's embedding architecture, optimized for search and retrieval tasks
Brief-details: A lightweight DeBERTa-v3 model fine-tuned for Natural Language Inference tasks, achieving 91.64% accuracy on SNLI with cross-encoder architecture.
Brief-details: Microsoft's 3.8B parameter multilingual model optimized for 4-bit inference, supporting 128K context with strong reasoning capabilities.
Brief-details: A 7.47B parameter multimodal model capable of understanding both images and videos through unified visual representations, offering high-performance visual reasoning capabilities.
Brief-details: Qwen2.5-32B-Instruct-GGUF is a high-performance 32.8B parameter language model with multiple quantization options for efficient deployment, optimized for chat applications.
BRIEF DETAILS: A ControlNet model for SDXL that uses Canny edge detection for precise image generation control. Features 30K+ downloads and OpenRAIL++ license.
Brief-details: Llama-3.2-3B is Meta's latest 3.21B parameter multilingual LLM, optimized for dialogue and supporting 8+ languages with enhanced efficiency and reduced memory usage.
Brief Details: A 1.1B parameter language model trained on 41B tokens, featuring flash attention and enhanced MLP layers. Optimized for text generation and edge deployment.
Brief Details: A specialized ControlNet model for depth-aware image generation, built on Flux.1-dev. Enables precise depth map-guided image creation with 3.3K+ downloads.
Brief Details: CLIP ViT-B/16 model trained on DataComp-1B dataset, achieving 73.5% ImageNet accuracy. Specialized for zero-shot image classification and retrieval tasks.
Brief Details: A Korean-English instruction-tuned 9B parameter LLM based on Gemma 2, optimized for conversational AI and detailed explanations, with strong performance on Korean language tasks.