Bhasha ASR demo — Hugging Face / faster-whisper / Alignment mode
Instructions:
Hugging Face backend
: Use full model IDs like
vasista22/whisper-hindi-small
faster-whisper backend
: Use model sizes only:
tiny
,
base
,
small
,
medium
, or
large
Alignment mode
: Provide both audio and the original transcript to align words to audio.
Mode: STT or Align?
stt
align
Upload audio
Drop Audio Here
- or -
Click to Upload
Transcript (for alignment mode)
Backend (STT mode only)
huggingface
faster-whisper
HF model id
vasista22/whisper-hindi-small
faster-whisper model size
small
Chunk length (s)
↺
5
60
Language code (hi, en, etc.)
hi
Request word-level timestamps (if model supports it)
Force Hindi decoding tokens
Transcribe / Align
Transcription (plain text)
Segments / chunks (if available)
Raw result (JSON-ish)