arXiv

Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

June 4, 2026 · Jiashu Yao, Heyan Huang, Daiqing Wu, Wangke Chen, Huaxi Ai, Haoyu Wen, Zeming Liu, Yuhang Guo · Original Source

Title: Evaluating Living-Screen-Native GUI Agents on Short-Video Platforms

Abstract: Current GUI agents operate under the premise of a static display, effectively pausing the digital world between operational steps. This model fails to account for dynamic interfaces found in short-video applications, where content streams continuously. In these environments, users must actively manage their attention, determining both which videos to view and the duration of that viewing. To address this gap, we define the challenge of "Living-Screen-Native" GUI agents and present LivingScreen, the inaugural benchmark designed for short-video platforms. This benchmark features a realistic browser-based simulation, a comprehensive three-tier task hierarchy, and evaluation metrics that balance accuracy with information efficiency. Our evaluation of numerous state-of-the-art models reveals that none currently match human performance in terms of the cost-accuracy trade-off. Furthermore, we identify that the primary limitation for these models is the tendency toward excessive or insufficient observation, highlighting a critical need for improved observation control capabilities in the development of future GUI agents. The associated code and data will be publicly accessible at https://github.com/BITHLP/LivingScreen.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Bloomberg

Exelon CEO Sees Daily Cybersecurity Threats

June 4, 2026

Exelon’s CEO warns of daily cybersecurity threats, highlighting persistent risks to the energy giant.

TechCrunch

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story

June 4, 2026

Ramp secured $750M at a $44B valuation, driven by AI integration and $1.5B+ revenue. The fintech firm now serves 70,000 ...

TechCrunch

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

June 4, 2026

Hello Robot’s Stretch avoids Silicon Valley hype, focusing on practical home deployment to gather essential real-world d...

Bloomberg

Canada to Provide Funding, Buy Equity Stakes in AI Startups

June 4, 2026

Canada will fund and buy equity stakes in AI startups to boost the sector. This investment aims to strengthen the nation...

TechCrunch

Chinese spies are using LinkedIn to lure Westerners into sharing sensitive information

June 4, 2026

A joint Western security alert warns that Chinese spies use LinkedIn to impersonate recruiters and extract sensitive dat...

Bloomberg

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower

June 4, 2026

Peter Thiel’s family office set a record rent for a Miami tower lease. This deal establishes a new benchmark for the cit...

Top international news

Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

Related Articles

Exelon CEO Sees Daily Cybersecurity Threats

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

Canada to Provide Funding, Buy Equity Stakes in AI Startups

Chinese spies are using LinkedIn to lure Westerners into sharing sensitive information

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower