๐ŸŽ™๏ธ Qwen3 ASR 0.6B Demo

This demo uses the Qwen3-ASR-0.6B model for automatic speech recognition on CPU.

Note: Running on CPU. Inference may take 10-30 seconds depending on audio length.

๐ŸŽค Input

Language

Convert numbers to digits, etc. (Chinese & Japanese only)

๐Ÿ“„ Output

๐Ÿ’ก Keyboard Shortcuts

  • Ctrl/Cmd + Enter: Start transcription
  • Esc: Clear audio input

๐Ÿ“‹ Tips for Best Results

  • Use clear audio recordings
  • Supported audio formats: WAV, MP3, FLAC, OGG
  • Mono, 16kHz sample rate recommended
  • For long audio, consider splitting into shorter segments

๐ŸŒ Supported Languages

This model supports 90+ languages including:

  • English, Chinese, Spanish, French, German
  • Japanese, Korean, Arabic, Hindi, Russian
  • And many more...

โ„น๏ธ Model Details

  • Architecture: Qwen3 Audio-Speech-Recognition
  • Parameters: 0.6 Billion
  • License: Apache 2.0
  • Model Card: Qwen/Qwen3-ASR-0.6B