Modeling Distinct Human Interaction in Web Agents
Title: Capturing Unique Human Dynamics in Web-Based Agents
Abstract
Although autonomous web agents have advanced significantly, human oversight remains crucial for guiding preferences and adjusting agent behavior during task execution. Yet, existing systems often fail to grasp the rationale or timing behind human interventions, leading them to bypass vital decision-making junctures or seek redundant approvals. This study presents a framework for modeling human intervention to facilitate more effective collaborative web tasks. We introduce CowCorpus, a new dataset comprising 400 real-world web navigation sessions that feature more than 4,200 alternating actions by humans and agents. Our analysis reveals four specific modes of user-agent interaction: hands-off supervision, hands-on oversight, collaborative problem-solving, and complete user takeover. By utilizing these findings, we trained language models (LMs) to predict the likelihood of user intervention based on distinct interaction styles. This approach boosted intervention prediction accuracy by 61.4% to 63.4% compared to baseline LMs. Furthermore, integrating these intervention-aware models into live web navigation agents resulted in a 36.8% rise in user-assessed utility, as demonstrated in a user study. Collectively, our findings indicate that structurally modeling human intervention enables the development of more responsive and cooperative agents.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





