Brief Details: CogVLM2 is a powerful 19B parameter multimodal model supporting 8K text length and 1344x1344 image resolution, built on LLaMA-3 for English chat & vision tasks.
Brief-details: Latest 32B parameter instruction-tuned LLM from Qwen featuring 128K context length, multi-language support, and enhanced capabilities in coding, math, and long-text generation.
Brief-details: English Universal Part-of-Speech tagger using Flair embeddings and LSTM-CRF architecture. Achieves 98.6% F1-score on Ontonotes dataset.
Brief Details: Qwen2-0.5B-Instruct is a compact 494M parameter instruction-tuned language model offering improved performance over its predecessor with enhanced capabilities in reasoning and generation tasks.
Brief-details: BiomedCLIP model for biomedical image-text processing, trained on 15M figure-caption pairs from PubMed Central. Combines PubMedBERT and ViT.
BRIEF-DETAILS: CodeSage-Large is a 1.3B parameter code embedding model trained on Stack data, supporting 9 programming languages with 2048-dimensional embeddings.
Brief Details: DPT-Large is a 342M-parameter Vision Transformer model for monocular depth estimation, trained on 1.4M images with state-of-the-art zero-shot transfer capabilities.
BRIEF-DETAILS: Advanced text-to-image model focused on realistic image generation, featuring specialized VAE integration and optimized negative prompting system for high-quality outputs.
Brief-details: English Part-of-Speech tagger using Flair embeddings and LSTM-CRF architecture. Achieves 98.19% F1-score on Ontonotes dataset. Supports 36 POS tags.
Brief-details: XLNet base-cased model - A powerful pre-trained transformer model for language understanding tasks, trained on BookCorpus and Wikipedia, featuring permutation-based learning.
BRIEF-DETAILS: A 3.09B parameter GGUF-formatted language model optimized for text generation with multiple quantization options (2-8 bit precision)
Brief Details: Japanese speech recognition model with 619M parameters using Fast Conformer architecture. Supports long-form audio and achieves high accuracy with Longformer attention.
Brief-details: Efficient semantic segmentation model with 3.75M params, fine-tuned on ADE20k dataset. Uses hierarchical Transformer encoder and MLP decoder for image segmentation tasks.
BRIEF DETAILS: Advanced text-to-image model optimized for photorealistic generation with 166K+ downloads. Features detailed skin rendering and high-quality photo simulation.
BRIEF DETAILS: DziriBERT is a pioneering 124M-parameter BERT model for Algerian dialect, supporting both Arabic and Latin scripts, trained on ~1M tweets.
BRIEF-DETAILS: Open-source 3B parameter LLaMA reproduction trained on RedPajama-1T dataset. Apache 2.0 licensed with strong performance across NLP tasks.
Brief Details: Lightning-fast SDXL model by ByteDance capable of generating 1024px images in 1-8 steps, with various checkpoint options including UNet and LoRA variants
Brief-details: A Chinese text embedding model (326M params) using Matryoshka Representation Learning, offering flexible embedding dimensions (1024/1792) with strong performance on C-MTEB benchmark (69.07% average score).
Brief-details: ResNet-ViT hybrid model with 99M params, trained on ImageNet-21k & fine-tuned on ImageNet-1k. Optimized for 384x384 images, ideal for high-res classification.
BRIEF DETAILS: Advanced CLIP model trained on DataComp-1B dataset, achieving 79.2% zero-shot accuracy on ImageNet-1k. Optimized for research and zero-shot image classification.
Brief-details: CogVideoX-5b is a 5B parameter text-to-video generation model supporting high-quality video synthesis with BF16 precision and optimized VRAM usage starting from 5GB.