arXiv

Attend to Anything: Foundation Model for Unified Human Attention Modeling

Title: Attend to Anything: A Foundation Model for Unified Human Attention Modeling

Abstract

Current approaches to human attention (or saliency) modeling remain heavily fragmented, varying significantly across different modalities, environments, and task definitions. As a result, despite advancements in model capacity and the volume of training data, existing solutions are largely confined to specific scenes and tasks, lacking the practical generalization capabilities required for real-world deployment. To overcome these fundamental constraints, we introduce the Attend to Anything Model (AAM), a multi-modal foundation model designed to consolidate attention modeling across a wide spectrum of image, video, and audio-visual tasks and settings. AAM redefines attention as a cognitive entailment relationship structured within a general-to-specific hierarchy, a framework realized through language prompts and hierarchical embeddings situated in hyperbolic space. Additionally, to bridge the gap between static image attention and dynamic video attention, we employ a fluid-dynamics approach, modeling video-frame attention as a diffusive temporal evolution driven by the Fokker--Planck equation. Our extensive evaluation across 16 benchmarks reveals that AAM consistently surpasses state-of-the-art methods, delivering an average performance improvement of 6% across diverse scenarios and achieving roughly a 4$\times$ acceleration in video inference. These findings establish AAM as a robust and principled foundation for future investigations into attention and saliency-related tasks. The associated dataset and code are accessible at https://github.com/wz-zhao/Attend-to-Anything.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...