Brief-details: State-of-the-art Vietnamese-to-English neural machine translation model developed by VinAI, using mbart architecture with AGPL-3.0 license
BRIEF DETAILS: Advanced vision-encoder-decoder model (202M params) for converting invoice/receipt images to structured JSON/XML without OCR, built on Donut architecture.
Brief-details: A specialized 7B parameter LLM based on llama-2-7b-32k, fine-tuned for web search and information extraction with extended context window and no data logging
BRIEF DETAILS: DISC-MedLLM is a Chinese medical domain LLM based on Baichuan-13B, trained on 470k+ medical examples, specializing in conversational healthcare and medical consultations.
Brief-details: YOLOv8-based object detection model for web form UI element detection. Achieves 0.52 mAP@0.95 precision. Trained on 600 images for identifying form fields like input boxes and buttons.
Brief Details: Arabic text-to-speech model from Facebook's MMS project. 36.3M parameters, VITS architecture, supports end-to-end speech synthesis with stochastic duration prediction.
Brief Details: Add-Detail-XL is an experimental AI model by PvDeep with unknown architecture specifications. Features 21 community likes and unknown license status.
Brief-details: SALMONN is a groundbreaking LLM enabling speech, audio, and music understanding, developed by Tsinghua University and ByteDance, featuring multimodal audio perception capabilities.
Brief-details: WavLM-based emotion diarization model trained on 6 datasets, achieving 29.7% EDER. Identifies emotion segments in speech with temporal boundaries.
Brief-details: TANGO - A state-of-the-art text-to-audio generation model using latent diffusion and Flan-T5 encoder, capable of creating realistic sounds from text prompts.
Brief Details: BLIP-2 image-to-text model using OPT-2.7b LLM architecture. Specializes in image captioning and visual QA with frozen image encoders. MIT licensed.
Brief-details: Technology-oriented 7B parameter LLM specialized in knowledge graph construction, relation extraction, and technical domain Q&A with Chinese-English capabilities
Brief-details: Collection of LoRA models focused on art styles, character expressions, and body features. Includes Indonesian language support & tarot card styles. 24 likes, diverse implementations.
Brief Details: LINE's Japanese DistilBERT model trained on 131GB web text. 6-layer architecture with 68M params. Strong JGLUE benchmark performance. Apache 2.0 licensed.
Brief-details: Bulgarian word vectors trained by Facebook using fastText on Common Crawl & Wikipedia. Supports efficient word embeddings & text classification in Bulgarian.
Brief-details: NVIDIA FastPitch is a parallel transformer-based TTS model with 45M parameters, offering prosody control and English speech synthesis using LJSpeech dataset.
Brief-details: LSTM-based weather forecasting model that predicts temperature using 6 years of climate data from Jena, Germany. Processes 14 weather features for 12-hour predictions.
Brief-details: Large English language model from spaCy with 514K word vectors, 97.3% POS accuracy, and strong NER (85.4% F-score) capabilities for NLP tasks
BRIEF-DETAILS: Advanced SDXL v-prediction model trained on Danbooru/e621 datasets. Optimized for high-quality image generation with specific parameter requirements and extensive documentation.
Brief Details: BERT-based emotion detection model with 109M parameters, achieving 93.8% accuracy on emotion classification from text, supporting 6 emotion categories.
Brief-details: Russian T5-based abstractive summarization model with 244M parameters, fine-tuned on 4 datasets for generating concise Russian text summaries