Brief-details: 8B parameter imatrix-quantized GGUF model with multiple compression variants (2.1GB-6.7GB), optimized for efficient deployment and conversational tasks.
Brief-details: RQwen-v0.1-GGUF is a 14.8B parameter bilingual (English/Russian) language model with multiple GGUF quantization options for efficient deployment
Brief-details: An 8B parameter GGUF-quantized language model featuring multiple compression variants (2.1GB-6.7GB) with iMatrix quantization for optimal performance balance.
Brief Details: 8B parameter GGUF-quantized language model offering multiple compression variants (Q2_K to f16) optimized for efficient deployment and inference.
Brief Details: 8B parameter GGUF-quantized language model with multiple compression variants (Q2-Q8), optimized for efficient deployment and inference
Brief-details: An 8B parameter GGUF-quantized language model optimized for various compression levels, offering multiple quantization options from 3.3GB to 16.2GB file sizes.
Brief-details: 7B parameter GGUF-quantized conversational model offering multiple quantization variants from 3.1GB to 15.3GB, optimized for efficient deployment with Apache 2.0 license.
BRIEF-DETAILS: 8B parameter GGUF-quantized language model with multiple compression variants (Q2-Q8), optimized for efficient deployment and memory usage
Brief-details: A 9.24B parameter GGUF-quantized Gemma model offering multiple compression variants, optimized for conversational AI with strong performance/size trade-offs.
Brief Details: 8B parameter GGUF model optimized for inference with various quantization options. Features extended context length of 1024k and imatrix quantization.
BRIEF DETAILS: 7B parameter GGUF model optimized for efficient inference with multiple quantization options (Q2_K to f16), offering flexibility between model size and quality.
Brief Details: Thor v1.1e 8B parameter GGUF model with various quantization options, optimized for inference with 1024k context window
Brief Details: A 1.24B parameter LLaMA-based model fine-tuned with LoRA on NVIDIA's ChatQA dataset, optimized for conversational AI and QA tasks with 1024 token context.
Brief-details: A multi-lingual Indian language model supporting 12 languages, quantized for efficient deployment with 3.21B parameters and various GGUF compression options.
Brief-details: Qwen2-VL-2B-Instruct is an ONNX-compatible vision-language model optimized for Transformers.js, enabling image+text to text generation with 2B parameters.
Brief-details: Behemoth 123B v2.2 - Advanced Largestral 2411 finetune with system prompt support, optimized for creative tasks using Metharme format with Mistral tokens at 5-bit precision
Brief-details: A quantized 7.62B parameter conversational AI model offering multiple GGUF variants optimized for different performance/quality tradeoffs and hardware configurations
Brief Details: RQwen-v0.1: A 14.8B parameter bilingual (EN/RU) instruction-tuned model based on Qwen2.5, featuring reflection tuning and strong logical reasoning capabilities.
Brief Details: A 1.24B parameter LLaMA-based conversational AI model optimized for mobile devices, featuring offline capabilities and instruction-following abilities.
Brief-details: A quantized version of Sue Ann 11B model available in multiple GGUF formats, optimized for different performance/quality tradeoffs, ranging from 4.1GB to 21.6GB
Brief-details: 8B parameter merged language model using Model Stock method, combining multiple LLaMA-based models for text generation. Created by MrRobotoAI with FP16 precision.