Brief-details: Mistral-7B variant using Quiet-STaR technique for generating 8 thought tokens before each output, enhancing prediction quality through continued pretraining
Brief Details: RoBERTa-base model with 124M parameters trained on 124M tweets through 2021, optimized for Twitter text analysis and masked language modeling.
Brief Details: BERTweet-based model fine-tuned for stance detection on climate change discussions, developed by CardiffNLP for social media analysis.
Brief-details: Fine-tuned DistilBART CNN model optimized for text summarization, achieving 25.92 ROUGE-1 score, trained with weak supervision on 1000 samples
Brief Details: A distilled BERT model for Indonesian language tasks, trained on 1.5GB of Wikipedia and newspaper data. Optimized for masked language modeling and text analysis.
Brief Details: BERT base model pre-trained on Indonesian Wikipedia (522MB), uncased, with 32k vocabulary size. Specialized for Indonesian language tasks using masked language modeling.
BRIEF DETAILS: BioBERT-based clinical outcome prediction model, pre-trained on medical texts and clinical notes. Specializes in patient outcome forecasting and clinical task analysis.
An advanced clinical mortality prediction model based on BioBERT, specializing in analyzing admission notes to predict in-hospital mortality risks with high accuracy
Brief Details: BlueBERT - A specialized BERT model pre-trained on PubMed and MIMIC-III data, optimized for biomedical NLP tasks and clinical text analysis.
Brief-details: A cased BERT-based language model created by shaktiman404, likely for experimental or educational purposes. Available on HuggingFace hub.
Brief Details: AraBERT v0.2 base model - Arabic BERT variant trained on 77GB of text with 136M parameters, optimized for Arabic NLP tasks without pre-segmentation.
BRIEF DETAILS: AraBERTv0.2-Twitter: Arabic BERT model fine-tuned on 60M tweets, optimized for dialectal Arabic with emoji support, 136M parameters, 64-token sequence length
Brief-details: A specialized BERT-large model fine-tuned for legal question answering tasks, optimized for extracting precise answers from legal documents and texts.
Brief Details: DNABERT-S is a specialized transformer-based model for DNA sequence analysis, offering efficient processing of genetic data with a streamlined architecture
BRIEF-DETAILS: Quantized version of phi-4 with 99.68% accuracy recovery, optimized for high throughput (4623 tokens/sec on NVIDIA L40S), supporting multiple languages
Brief-details: DeepSeek-R1-Distill-Llama-70B-AWQ is a 4-bit quantized version of the 70B parameter Llama model, distilled from DeepSeek-R1 for enhanced reasoning capabilities.
Brief-details: A 4-bit quantized MNN version of DeepSeek-R1-1.5B-Qwen, optimized for efficient inference using MNN framework. Features low memory usage and transformer fusion support.
Brief Details: A dual-purpose Vision Transformer model trained on LAION-400M dataset, compatible with both OpenCLIP (ViT-B-16-plus-240) and timm frameworks
Brief Details: Vision Transformer (ViT) base model pre-trained on ImageNet-21k with 14M images. Uses 32x32 pixel patches for 224x224 image processing. Ideal for computer vision tasks.
BRIEF-DETAILS: An uncensored 8B parameter variant of Hermes-3 using lorablation technique, built on Llama 3.1 architecture with focus on reduced limitations while maintaining performance.
BRIEF-DETAILS: Natural language processing model specialized in analyzing and determining certainty levels in sentences, developed by pedropei on HuggingFace