BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language
BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language
We introduce BaltiVoice, a 16.8-hour read-speech dataset designed for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan. This initiative addresses the lack of publicly available Automatic Speech Recognition (ASR) resources for the language. The corpus comprises 10,060 validated utterances written in the native Nastaliq script, sourced from Mozilla Common Voice recordings.
By fine-tuning OpenAI’s Whisper-small model on this new dataset, we achieved a Word Error Rate (WER) of 30.07% on a held-out validation set consisting of 538 utterances. This represents a significant improvement over the zero-shot baseline, which yielded a WER of 182.18% for Whisper-small when applied directly to Balti. The dataset, the fine-tuned model, and a live transcription demonstration are all publicly accessible on HuggingFace.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



