Algorithmic Fragility and Persona Bias in LLM-Generated Autistic Communication
Title: The Fragility of Algorithms and the Bias of Personas in LLM-Produced Autistic Discourse
Abstract:
While safety alignment effectively curtails overtly harmful content, it simultaneously imposes a sanitized, neurotypical framework on the communication styles of marginalized groups. This study examines such encoding through a dual-persona rewriting approach, wherein ten large language models (LLMs) were instructed to rephrase authentic autistic discourse adopting either an autistic or a neurotypical identity. The analysis reveals that rewrites generated from an autistic persona exhibit substantial divergence in both lexical structure and affective tone compared to those from a neurotypical persona, even though their semantic similarity remains comparable. Additionally, the majority of models fail to maintain distinct outputs across personas, collapsing their generations into nearly indistinguishable text.
To identify the underlying causes of this generative failure, we developed a multi-agent qualitative analysis framework. The findings highlight that systemic output erasure, stereotyped hallucinations, and meta-commentary designed to evade the task are widespread issues. These failure modes correlate more strongly with alignment strategies than with the size of the model parameters. Moreover, when compared directly with annotations from autistic human raters, the study shows that insider community knowledge leads to systematic reversals in classification labels relative to those produced by LLMs. Ultimately, these results suggest that current alignment protocols induce persona-specific generative breakdowns—a phenomenon detectable only through qualitative assessment. This confirms a profound representational deficit that cannot be remedied through prompt engineering alone.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





