AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection
Title: AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection
Abstract:
As modern language models achieve fluency comparable to humans and successfully bypass detectors that depend on likelihood-based cues or surface-level statistics, identifying AI-generated text has grown significantly more difficult. To address this, we introduce \textsc{AEyeDE}, a novel attribution-driven framework for distinguishing human from AI authorship. This method utilizes model attention as a key discriminative feature. By employing a \emph{proxy} Transformer model with white-box access, we extract attention-based attribution matrices from both human-written and AI-generated texts. A lightweight Convolutional Neural Network is then trained to derive representations from these attribution maps.
Our evaluation across encoder-decoder translation tasks demonstrates that \textsc{AEyeDE} consistently surpasses a text-only baseline. In decoder-only contexts, the approach excels in generator-specific detection tasks, maintains competitiveness on standard benchmarks, and exhibits strong robustness against cross-dataset transfer scenarios and alternative-spelling perturbations. Furthermore, our analysis reveals that attention maps possess recurring local structures; the relative frequencies of these structures show consistent divergence between human and AI texts across various datasets and proxy models. These results indicate that attention-based attribution maps offer a complementary and interpretable signal for AI-generated text detection. To facilitate further investigation, we will release the code publicly.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




