wav2vec2-large-xlsr-japanese

wav2vec2-large-xlsr-japanese

vumichien

Japanese speech recognition model based on wav2vec2-large-xlsr-53, fine-tuned on Common Voice and JSUT corpus, achieving 30.84% WER for Japanese ASR

PropertyValue
Model Authorvumichien
Base Modelfacebook/wav2vec2-large-xlsr-53
TaskJapanese Speech Recognition
PerformanceWER: 30.84%, CER: 17.85%
Model HubHugging Face

What is wav2vec2-large-xlsr-japanese?

wav2vec2-large-xlsr-japanese is a specialized speech recognition model fine-tuned specifically for Japanese language processing. Built upon Facebook's wav2vec2-large-xlsr-53 architecture, this model has been optimized using a combination of the Common Voice dataset and the Japanese speech corpus (JSUT) from Saruwatari-lab, University of Tokyo.

Implementation Details

The model operates at a 16kHz sampling rate and implements the CTC (Connectionist Temporal Classification) architecture for speech recognition. It utilizes MeCab for Japanese text tokenization and includes specialized preprocessing for Japanese characters.

  • Requires 16kHz audio input sampling rate
  • Implements MeCab tokenizer with wakati mode
  • Includes custom character filtering for Japanese text
  • Built on the wav2vec2 architecture with XLSR pre-training

Core Capabilities

  • Direct speech-to-text transcription without language model
  • Effective handling of Japanese phonetic structures
  • Batch processing support for multiple audio files
  • Integrated attention masking for improved accuracy

Frequently Asked Questions

Q: What makes this model unique?

This model combines the powerful wav2vec2-large-xlsr-53 architecture with specific optimization for Japanese language, offering a specialized solution for Japanese ASR tasks without requiring an additional language model.

Q: What are the recommended use cases?

The model is ideal for Japanese speech recognition tasks requiring 16kHz audio input, particularly useful for applications like voice transcription, subtitle generation, and voice command systems in Japanese.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026