Brief-details: Romanian speech recognition model with 315M parameters, achieving top performance in HuggingFace's Robust Speech Challenge. WER: 7.31% with LM.
Brief Details: SlovakBERT - A 125M parameter RoBERTa-based language model trained on 19.35GB of Slovak text data, optimized for masked language modeling tasks.
Brief Details: Robust sentence embedding model built on RoBERTa-large, trained on 1B+ sentence pairs across 24 datasets for superior semantic understanding.
Brief Details: Chinese BART-Large (407M params) - Advanced text-to-text generation model with extended vocabulary of 51,271 tokens, optimized for Chinese language tasks including summarization and generation.
Brief-details: k2t is a T5-based text generation model that converts keywords into coherent sentences, featuring multiple variants (base, tiny) and integration with HuggingFace's ecosystem.
Brief-details: Polish GPT2 language model trained on Oscar corpus for text generation. Features advanced text generation capabilities with perplexity of 21.79. Supports zero/few-shot learning.
Brief-details: MedCLIP is a specialized CLIP model fine-tuned on the ROCO medical dataset for radiology image-text understanding, supporting medical image classification and caption matching.
Brief-details: A fine-tuned CLIP model optimized for remote sensing image classification and retrieval, achieving 88.3% accuracy at k=1, specialized for aerial imagery analysis.
Brief Details: ALBERTI is a multilingual BERT-based model specialized in poetry analysis, featuring 178M parameters and trained on the PULPO corpus across multiple languages.
Brief-details: FlauBERT Large Cased - A 373M parameter French language model trained on diverse French corpora, featuring 24 layers and 16 attention heads for advanced NLP tasks.
Brief-details: A powerful English chunking model achieving 96.48% F1-score on CoNLL-2000, using Flair embeddings and LSTM-CRF for phrase identification.
Brief-details: XGLM-564M is a multilingual autoregressive language model with 564M parameters, supporting 30+ languages and trained on 500B tokens for few-shot learning tasks.
Brief-details: German legal NER model using Flair embeddings & LSTM-CRF, achieving 96.35% F1-score on LER German dataset. Identifies 19 legal entity types.
Brief-details: A powerful 270M parameter German-to-English translation model by Facebook, achieving 41.35 BLEU score on WMT19, based on FSMT architecture.
Brief-details: XGLM-1.7B is a multilingual language model supporting 31 languages with 1.7B parameters, trained on 500B tokens for few-shot learning
Brief-details: Facebook's advanced multilingual speech model with 2B parameters, supporting 128 languages. Pretrained on 436K hours of speech data. State-of-the-art performance in speech recognition and translation.
Brief Details: Facebook's XLS-R 2B parameter model for multilingual speech translation, supporting 21 languages to English translation with state-of-the-art performance.
Brief-details: Wav2vec2-large is Facebook's advanced speech recognition model, pretrained on 16kHz audio, achieving state-of-the-art WER of 1.8/3.3 on Librispeech clean/other tests.
Brief-details: Facebook's Spanish speech recognition model based on wav2vec2-large-xlsr-53 architecture. Achieves 17.6% WER on Common Voice ES test set. Apache 2.0 licensed.
Brief-details: A Portuguese speech recognition model based on wav2vec2-large-xlsr-53 architecture, achieving 27.1% WER on Common Voice PT test set, suitable for ASR tasks.
Brief Details: Wav2Vec2 Spanish ASR model fine-tuned on VoxPopuli corpus. Features pretrained base architecture optimized for Spanish speech recognition with 16kHz audio.