Brief Details: VideoMAE base model with 94.2M params for self-supervised video pre-training. Uses masked autoencoding on Kinetics-400 dataset with ViT architecture.
Brief-details: A powerful 47.2M parameter SegFormer model fine-tuned for clothing segmentation, achieving 0.80 mean accuracy across 18 clothing categories with MIT license.
BRIEF DETAILS: Hermes-3-Llama-3.1-8B: Advanced 8B parameter LLM with enhanced conversational abilities, function calling, and JSON mode support. Built on Llama 3.1.
Brief-details: Compact pre-trained model for time series forecasting with only 805k parameters, offering state-of-the-art performance for minutely/hourly predictions with minimal computational requirements.
Brief-details: Advanced depth estimation model trained on 657M+ images, offering 10x faster processing and superior detail compared to previous versions with robust real-world performance.
BRIEF-DETAILS: MobileNet V2 is a lightweight vision model optimized for mobile devices with 3.54M parameters, offering efficient image classification on ImageNet-1k dataset.
BRIEF DETAILS: LLaVA-OneVision is a 73.2B parameter multimodal chat model combining advanced vision-language capabilities and DPO training, supporting English and Chinese interaction with images and videos.
Brief-details: Qwen2.5 14B uncensored instruction model with multiple GGUF quantizations (2.9-29GB), optimized for various hardware configurations and RAM constraints
BRIEF-DETAILS: MT5-small model specialized in Persian to English translation, built on multilingual T5 architecture. Popular with 79.5K downloads, licensed under CC-BY-NC-SA-4.0.
Brief Details: TrOCR base printed model (333M params) for OCR tasks. Vision-language model combining BEiT encoder and RoBERTa decoder for accurate text extraction from printed documents.
Brief Details: Fine-tuned Pegasus model optimized for conversation summarization using SAMSum dataset, featuring 79.6K downloads and linear learning rate scheduling.
Brief Details: DenseNet121 with RandAugment training - 8.06M params, ImageNet-trained classification model optimized for accuracy and efficiency
Brief-details: Quantized 72B-parameter multimodal model with state-of-the-art visual understanding, supporting 20min+ video analysis and multilingual capabilities at 4-bit precision.
Brief Details: A Vision Transformer model using SigLIP (Sigmoid Loss) for language-image pre-training, trained on WebLI dataset for zero-shot classification
Brief Details: Compact Vision Transformer (5.7M params) pretrained on ImageNet-21k and fine-tuned on ImageNet-1k, optimized for efficient image classification
Brief-details: Llama-3.2-3B-Instruct: A 3.2B parameter multilingual LLM from Meta, optimized for dialogue tasks with 2.4x faster training and 58% less memory usage.
Brief-details: LLaVa-based tokenizer model built for conversational AI and text generation, featuring 8B parameters and optimized for transformer architectures
Brief-details: InternVL2-2B is a 2.21B parameter multimodal LLM combining InternViT-300M vision model and InternLM2-chat language model, offering strong performance in image, video, and text understanding.
Brief Details: A bilingual LLaMA 3.2 variant (3B params) optimized for Korean-English, featuring full-tuning on 150GB Korean data with state-of-the-art performance on LogicKor benchmarks.
Brief Details: A powerful 14B parameter code-focused LLM with 128K context length, optimized for code generation, reasoning & fixing. Built on Qwen2.5 with 5.5T training tokens.
Brief-details: A powerful ConvNeXt vision model with 88.6M parameters, pre-trained on ImageNet-22k and fine-tuned on ImageNet-1k, achieving 85.8% top-1 accuracy.