Eyettention II: A Dual-Sequence Architecture for Modeling Fixation Location, Within-Word Landing Position, and Fixation Duration in Reading
Title: Eyettention II: A Dual-Sequence Architecture for Modeling Fixation Location, Within-Word Landing Position, and Fixation Duration in Reading
Abstract:
The dynamics of ocular movement during reading offer critical perspectives on both textual characteristics and the underlying cognitive mechanisms of the reader. Specifically, eye-tracking data collected during reading tasks has proven instrumental in numerous technological domains, including the refinement and interpretation of language models, as well as the deduction of reader profiles. Nevertheless, such applications typically depend on large-scale, data-centric models that require substantial eye-tracking datasets. Acquiring these datasets is often difficult, given the high resource costs associated with data collection.
To mitigate the issue of limited data availability, we introduce Eyettention II, a deep-learning framework trained end-to-end that produces realistic scanpaths. These scanpaths comprise a full range of fixation attributes in sequential order, specifically fixation location, within-word landing position, and fixation duration. Designed to be lightweight and efficient, the model can be trained effectively even with constrained GPU capabilities, while remaining consistent with established cognitive theories. Our results indicate that Eyettention II outperforms current state-of-the-art models in predicting scanpaths and accurately replicates human-like gaze patterns by accounting for essential psycholinguistic phenomena. Given its strong performance, Eyettention II is poised to advance natural language processing, support the piloting of psycholinguistic experiment materials, and reveal novel insights that extend beyond the explicit parameters of existing theoretical cognitive models.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





