Brief-details: Russian language model for detecting 18 sensitive topics in text, including crime, discrimination, and social issues. Trained on manual and semi-automatic labeled data.
Brief Details: Russian text detoxification model based on ruT5-base (223M params), converts toxic Russian text to neutral language with maintained meaning
Brief Details: Russian language model for detecting inappropriate messages that could harm reputation - focuses on sensitive topics beyond toxicity, achieving 89% accuracy.
Brief-details: Specialized speech recognition model for Quran recitation, built on wav2vec2 architecture. Fine-tuned on Arabic speech data for accurate Quranic verse identification.
Brief Details: A Spanish biomedical RoBERTa model trained on 1B+ tokens of clinical text, achieving SOTA results on medical NER tasks with 90.04% F1 score.
Brief Details: Japanese language model based on GPT-J 6B, specialized in storytelling. Features 6B parameters, 28 layers, and supports both Japanese and English text generation.
Brief Details: RoBERTa-based depression detection model achieving 97.45% accuracy. Fine-tuned for identifying depressive content in text with MIT license.
Brief-details: SGPT-125M is a lightweight GPT-based sentence embedding model optimized for semantic search, featuring weighted-mean pooling and BitFit training
Brief Details: Norwegian BERT-base model (179M params) trained on 200 years of Norwegian text, supporting both bokmål and nynorsk variants for masked language modeling.
Brief Details: A powerful multilingual translation model fine-tuned on OPUS100 dataset, specifically optimized for English to Portuguese translation with BLEU score of 20.61.
BRIEF-DETAILS: Hungarian GPT-2 model specialized in news generation, trained on Wikipedia and news sites. Achieves 22.06 perplexity score, MIT licensed.
Brief Details: A fine-tuned DistilRoBERTa model achieving 98.9% accuracy for stereotype detection, particularly focusing on gender bias identification in text.
BRIEF DETAILS: Multilingual question-answering model supporting English, Spanish & Basque, fine-tuned on SQuAD dataset. Optimized for extractive QA tasks with high accuracy.
Brief-details: Italian sentiment analysis model for text classification, based on FEEL-IT dataset. Achieves 0.84 accuracy on SENTIPOLC16. Built on UmBERTo architecture.
Brief Details: M-BERT-Distil-40 is a multilingual BERT model supporting 38 languages, fine-tuned to match CLIP's embedding space. Optimized for cross-lingual text understanding and feature extraction.
Brief-details: A Chinese language BigBird model optimized for 1024 token sequences, featuring Jieba tokenization and Apache 2.0 license. Ideal for Chinese text processing and feature extraction.
Brief Details: A German language Longformer model with 153M parameters, trained on OSCAR corpus. Features 8192 token sequence length and 512-token attention windows.
Brief-details: Powerful Chinese BERT model trained on 300GB corpus, achieving SOTA on 9 NLP tasks. Features MLM, POS tagging & SOP training objectives.
Brief Details: DarijaBERT is a pioneering BERT model for Moroccan Arabic (Darija), trained on 3M sequences with 209M parameters, specializing in dialectal understanding.
Brief Details: BERT-based Chinese POS-tagger and dependency parser, pre-trained on Wikipedia texts, supporting Universal Part-Of-Speech tagging with Apache 2.0 license.
Brief-details: A specialized BERT model for Persian language understanding, featuring zero-width non-joiner character handling and trained on diverse Persian corpora