Brief Details: High-performance text-to-image model based on Flux.1, optimized for 4-8 step generation with improved quality and reality compared to other Flux variants. 11.9B parameters.
Brief-details: A specialized LoRA fine-tuning model for FLUX.1-dev focusing on photorealistic image generation with high detail and natural textures, trained on 27 curated images using constant LR scheduling and AdamW optimization.
Brief Details: A lightweight image captioning model with 271M parameters, offering multiple caption styles and efficient VRAM usage (~1GB). Features improved tag generation and detailed analysis capabilities.
Brief Details: LLM2CLIP-EVA02-L-14-336: Advanced vision-language model leveraging LLMs to enhance CLIP capabilities, supporting zero-shot classification and cross-lingual tasks.
Brief Details: A 10.2B parameter multimodal LLM built on Gemma2-9B, featuring SigLIP-400M vision encoder and state-of-the-art performance in image-text tasks.
BRIEF DETAILS: Apple's AIMv2-huge vision model with 681M parameters, achieving 87.5% ImageNet accuracy. Supports PyTorch/JAX for image feature extraction.
Brief Details: A specialized LoRA model for clothing image generation, built on FLUX.1-dev with Florence-2-large captioning. Optimized for detailed clothing renders with 64 network dimensions.
Brief Details: A specialized LoRA model for FLUX.1-dev that generates 2.5D cartoon-style images with 64 network dimensions and 32 alpha, optimized for 768x1024 resolution.
Brief Details: A specialized LoRA model for The Walking Dead-themed image generation, built on SDXL base 1.0, featuring zombie and post-apocalyptic scene creation capabilities.
Brief-details: Armenian language text embedding model with 278M parameters, based on multilingual-e5-base. Optimized for semantic search and RAG applications.
Brief-details: Llama-3.1-Tulu-3-70B-DPO is an advanced instruction-following model built on Llama 3.1, optimized through DPO training for enhanced performance across diverse tasks.
Brief Details: AIMv2-3B: A 2.72B parameter vision model achieving 89.5% ImageNet accuracy with frozen trunk, supporting PyTorch/JAX, optimized for image feature extraction.
Brief Details: A 435M parameter DeBERTa-v3-based safety classification model designed to protect against LLM jailbreak attacks using HarmAug data augmentation.
Brief-details: A powerful 3D VAE model for video compression, achieving 4096x downsampling while maintaining quality. Enables efficient video generation through latent space encoding.
Brief Details: A specialized ControlNet depth model for FLUX.1-dev, trained on real and synthetic data using Depth-Anything-V2, offering precise depth-aware image generation capabilities.
BRIEF DETAILS: Visual language model for efficient document retrieval, combining SmolVLM with ColBERT strategy for multi-vector text/image representations.
Brief-details: A specialized LoRA model trained on FLUX.1-dev for realistic image generation, featuring 64 network dimensions and optimized for 768x1024 resolution outputs.
Brief-details: Compact multimodal model combining vision and language capabilities, optimized with DPO training. Efficient architecture for image+text tasks with Apache 2.0 license.
Brief-details: A LoRA model for FLUX.1-dev specialized in generating cartoon-style capybara artwork, featuring 64 network dimensions and 15 training epochs
Brief-details: LLM2CLIP vision model that combines CLIP with LLM capabilities for improved visual-text understanding. 579M params, supports zero-shot classification and cross-lingual tasks.
Brief Details: A 12.2B parameter Mistral-based merged model optimized for creative writing and worldbuilding, combining multiple specialized models for enhanced capabilities.