arXiv

WAXAL-NET: Finetuned Edge ASR Across 19 African Languages

Title: WAXAL-NET: Optimizing Edge-Based Automatic Speech Recognition for 19 African Languages

Abstract:

This study investigates the efficacy of compact, domain-specific Automatic Speech Recognition (ASR) models against large-scale multilingual foundation models when processing conversational speech in African languages. Utilizing the WAXAL corpus, we assessed performance across 19 distinct languages. Our findings indicate that fine-tuned models designed for edge deployment significantly outperform zero-shot baselines, achieving a macro-averaged Word Error Rate (WER) of $38.0\%$. This represents a substantial $26.9$ percentage-point improvement over the best zero-shot baseline, which recorded a WER of $64.9\%$, despite the fine-tuned models being $3$ to $40$ times smaller in size. These results underscore that domain specialization is a more critical factor than model scale for handling spontaneous African speech.

Further analysis through cross-domain evaluation reveals that fine-tuned models maintain robust performance on out-of-distribution (OOD) speech, whereas zero-shot models regain their competitive edge when the testing data aligns with their pretraining distribution. To deepen our understanding of error patterns, we conducted a distributed audit involving native speakers across all surveyed languages. This process yielded a linguistically grounded error taxonomy, highlighting distinct behavioral differences between Connectionist Temporal Classification (CTC) and autoregressive architectures across various language families.

Additionally, the study demonstrates that WER alone is an insufficient metric for languages utilizing syllabary scripts. In such cases, Character Error Rate (CER) to WER ratios expose significantly higher character-level accuracy than the headline WER figures suggest. To support ongoing research in African ASR, we have publicly released the cleaned WAXAL subset encompassing all 19 languages, along with the corresponding model weights, fine-tuning procedures, and evaluation scripts.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...