Brief Details: GPyT - A GPT2 model trained on 200GB of Python code from GitHub, designed for code generation and completion. Apache 2.0 licensed, supports 1024 context length.
Brief Details: A specialized text-to-image model trained on simple icons, optimized for generating clean, minimal icon designs with white backgrounds using stable diffusion.
Brief-details: A fine-tuned Whisper ASR model specialized for Quranic Arabic, achieving 5.75 WER after training on 8 GPUs with impressive convergence from 13.39 to 5.75 WER.
Brief-details: A fine-tuned Whisper-tiny model specialized for Arabic Quran speech recognition, achieving 7.05 WER with progressive training improvements over 5000 steps.
Brief-details: A fine-tuned Whisper medium model for Urdu speech recognition, achieving 26.98% WER on Common Voice, with 764M parameters and Apache 2.0 license.
Brief Details: Japanese RoBERTa-based question-answering model with 110M parameters. Fine-tuned on JaQuAD dataset for extractive QA tasks. MIT licensed, optimized for Japanese text.
Brief-details: An optimized Korean speech recognition model based on Whisper-medium, achieving 3.64% WER on Zeroth Korean dataset with impressive CER of 1.48%.
Brief-details: Conformer-CTC Large model for Esperanto speech recognition, trained on Mozilla Common Voice. 120M params, achieves 4.8% WER on test set.
BRIEF DETAILS: UIE-base is a 118M parameter Chinese information extraction model based on ERNIE 3.0, capable of entity, relation, and event extraction with zero-shot learning capabilities.
Brief-details: T5-XXL-based NLI model for binary entailment prediction, trained on 6 major datasets including SNLI, MNLI, and FEVER. Apache 2.0 licensed.
Brief Details: BiT-50 is a powerful image classification model using ResNetv2 architecture, achieving 87.5% top-1 accuracy on ImageNet. Excellent transfer learning capabilities.
Brief-details: A fine-tuned Whisper medium model for French speech recognition, achieving 11.14% WER with normalization and surpassing OpenAI's original model performance.
Brief-details: GIT-base model fine-tuned on VQAv2 dataset for visual question answering. 177M params, MIT license, supports image-text tasks with CLIP integration.
Brief Details: A Russian speech-to-text model combining Wav2Vec2 and mBART-50 architectures, achieving WER of 13-32% across multiple datasets. Handles punctuation and capitalization.
Brief-details: MaxViT small variant optimized for 512x512 images with 69.1M parameters, achieving 86.1% top-1 accuracy on ImageNet, combining convolution and attention mechanisms.
Brief-details: A fine-tuned Whisper Large model optimized for Czech speech recognition, achieving 10.83% WER on Common Voice 11.0, trained with linear learning rate scheduling and mixed-precision.
Brief-details: Pre-trained speech model using clustering and cross-contrastive loss, achieving 15.6% WER improvement on LibriSpeech. Built for 16kHz audio processing.
Brief-details: ONNX-exported ESPnet JETS TTS model for English speech synthesis, based on LJSpeech dataset. Features easy integration with txtai and direct ONNX runtime support.
Brief Details: Japanese text summarization model based on mt5-small with 300M params, fine-tuned on BBC news articles achieving 46.25% ROUGE1 score.
Brief Details: A high-quality English text-to-speech ONNX model based on ESPnet VITS architecture, optimized for LJSpeech dataset with Apache 2.0 license.
Brief Details: French speech recognition model using wav2vec2 architecture. 315M params, trained on 2200+ hours of French audio. WER 9.66% with LM on Common Voice.