arXiv

WAXAL-NET: Finetuned Edge ASR Across 19 African Languages

June 2, 2026 · Victor Tolulope Olufemi, Oreoluwa Babatunde, Ramsey Njema, Bolarinwa Gbotemi, Wanchi Lucia Yen, John Uzodinma, Sunday Ajayi, Oluwademilade Williams, Kausar Moshood, Innocent Elendu Anyaele, Akebert Arefaine, Candace Hunzwi, Wongel Dawit Daniel, Emmilly Na · Original Source

Title: WAXAL-NET: Optimizing Edge-Based Automatic Speech Recognition for 19 African Languages

Abstract:

This study investigates the efficacy of compact, domain-specific Automatic Speech Recognition (ASR) models against large-scale multilingual foundation models when processing conversational speech in African languages. Utilizing the WAXAL corpus, we assessed performance across 19 distinct languages. Our findings indicate that fine-tuned models designed for edge deployment significantly outperform zero-shot baselines, achieving a macro-averaged Word Error Rate (WER) of $38.0\%$. This represents a substantial $26.9$ percentage-point improvement over the best zero-shot baseline, which recorded a WER of $64.9\%$, despite the fine-tuned models being $3$ to $40$ times smaller in size. These results underscore that domain specialization is a more critical factor than model scale for handling spontaneous African speech.

Further analysis through cross-domain evaluation reveals that fine-tuned models maintain robust performance on out-of-distribution (OOD) speech, whereas zero-shot models regain their competitive edge when the testing data aligns with their pretraining distribution. To deepen our understanding of error patterns, we conducted a distributed audit involving native speakers across all surveyed languages. This process yielded a linguistically grounded error taxonomy, highlighting distinct behavioral differences between Connectionist Temporal Classification (CTC) and autoregressive architectures across various language families.

Additionally, the study demonstrates that WER alone is an insufficient metric for languages utilizing syllabary scripts. In such cases, Character Error Rate (CER) to WER ratios expose significantly higher character-level accuracy than the headline WER figures suggest. To support ongoing research in African ASR, we have publicly released the cleaned WAXAL subset encompassing all 19 languages, along with the corresponding model weights, fine-tuning procedures, and evaluation scripts.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC