Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms
Title: Evaluating Living-Screen-Native GUI Agents on Short-Video Platforms
Abstract: Current GUI agents operate under the premise of a static display, effectively pausing the digital world between operational steps. This model fails to account for dynamic interfaces found in short-video applications, where content streams continuously. In these environments, users must actively manage their attention, determining both which videos to view and the duration of that viewing. To address this gap, we define the challenge of "Living-Screen-Native" GUI agents and present LivingScreen, the inaugural benchmark designed for short-video platforms. This benchmark features a realistic browser-based simulation, a comprehensive three-tier task hierarchy, and evaluation metrics that balance accuracy with information efficiency. Our evaluation of numerous state-of-the-art models reveals that none currently match human performance in terms of the cost-accuracy trade-off. Furthermore, we identify that the primary limitation for these models is the tendency toward excessive or insufficient observation, highlighting a critical need for improved observation control capabilities in the development of future GUI agents. The associated code and data will be publicly accessible at https://github.com/BITHLP/LivingScreen.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC



