Extracting accent features in spoken Brazilian Portuguese without sociolinguistic labels
Title: Isolating Regional Dialect Markers in Spoken Brazilian Portuguese Without Sociolinguistic Categorization
Abstract: The classification of regional accents in Brazilian Portuguese (pt-BR) is currently hindered by the lack of dependable labeling. Although large self-supervised learning (SSL) speech models offer significant capabilities, their training processes often obscure sociophonetic nuances because accent annotations are frequently unreliable or omitted from the training objectives. This study proposes an innovative feature extraction methodology that relies exclusively on acoustic labels. By leveraging a phoneme-based forced aligner (ZIPA) to pinpoint distinct regional accent markers, our specialized feature set captures dialectal variations more efficiently than standard utterance embeddings. The results indicate that localized features can surpass general-purpose architectures in accent-related tasks, achieving superior performance with minimal and objective data labeling.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC


