Brief-details: SPLADE passage retrieval model using knowledge distillation, achieving 38.3 MRR@10 on MS MARCO dev. Optimized for query/document expansion.
BRIEF-DETAILS: A Spanish-to-English translation model by Helsinki-NLP achieving 59.6 BLEU score on Tatoeba test set, using transformer architecture with SentencePiece tokenization
Brief-details: Multilingual punctuation restoration model supporting 12 languages, built on XLM-RoBERTa-base with strong performance metrics (98% accuracy)
Brief-details: SAM2 large model for segment-anything tasks in images/videos. Advanced mask generation capabilities with FAIR foundation model framework. Apache 2.0 licensed.
Brief Details: Multilingual sentiment analysis model trained on 198M tweets, supporting 8+ languages. Built on XLM-RoBERTa architecture with extensive Twitter dataset.
BRIEF DETAILS: Multilingual sentence embedding model supporting 14 languages, with 135M parameters. Maps sentences to 512-dimensional vectors for semantic search and clustering.
Brief Details: Multilingual DeBERTa-V3 base model with 86M parameters, supporting 16 languages. Enhanced BERT architecture with disentangled attention and ELECTRA-style pre-training.
Brief-details: LanguageBind_Video_FT is a fully fine-tuned video-language model that achieves state-of-the-art performance in video-text alignment through language-based semantic binding.
BRIEF DETAILS: Multilingual reranker model (568M params) optimized for fast inference and strong cross-language capabilities, built on BGE-M3 architecture with support for multiple languages and direct relevance scoring.
BRIEF-DETAILS: Large-scale speech recognition model supporting 100+ languages. Optimized version of Whisper using CTranslate2 for faster inference. MIT licensed with 700K+ downloads.
Brief Details: IndoBERT base model for Indonesian language processing - 124.5M params, trained on 23.43GB Indo4B dataset, MIT licensed
Brief Details: A 125M parameter RoBERTa-based model for scoring sentence well-formedness, evaluating grammar and case sensitivity with practical applications in content validation.
Brief-details: Small but powerful English embedding model (33.4M params) optimized for text similarity and retrieval tasks, achieving strong performance on MTEB benchmark
Brief-details: Multilingual sentence embedding model based on XLM-RoBERTa, maps sentences to 768D vectors, optimized for semantic similarity tasks, 278M parameters.
Brief Details: Multilingual biomedical entity representation model based on XLM-RoBERTa, trained on UMLS 2020AB dataset, optimized for cross-lingual semantic similarity tasks.
Brief-details: Sentence embedding model with 768-dimensional vectors, 109M parameters, built on MPNet architecture. Popular for semantic search and clustering tasks.
Brief Details: Deprecated sentence embedding model based on RoBERTa, maps text to 768D vectors. 125M params, Apache 2.0 licensed. Not recommended for new projects.
Brief-details: TrOCR base handwritten model (333M params) for OCR tasks. Microsoft-developed transformer-based architecture combining BEiT encoder and RoBERTa decoder. Optimized for IAM dataset.
Brief-details: BGE Large English embedding model (335M params) optimized for semantic search and text similarity, achieving SOTA performance on MTEB benchmark
Brief-details: SmolLM2-135M-Instruct-GGUF is a lightweight 135M parameter language model optimized for GGUF format, offering various quantization options from 2-bit to 8-bit precision.
Brief-details: Optimized 8B parameter LLaMA 3 model using 4-bit quantization, offering 2.4x faster inference with 58% less memory usage. Built for efficient deployment.