arXiv

"I Strongly Suspect This Website Is a Scam": Benchmarking PII Leakage and Detection without Defense in Autonomous Web Agents

**Title: "I Strongly Suspect This Website Is a Scam": Benchmarking PII Leakage and Detection without Defense in Autonomous Web Agents

Abstract:

Autonomous web agents are frequently manipulated by deceptive online content—commonly referred to as social-engineering attacks—into transmitting users' personally identifiable information (PII) to servers controlled by malicious actors. This study demonstrates that such social-engineering tactics are remarkably successful at extracting high-value PII from state-of-the-art web agents, presenting a significant threat to deployed agentic systems.

To measure this vulnerability, we present \textsc{Scammer4U}, a pre-registered benchmark comprising 91 attacker-controlled environments and 10 benign-twin baseline sites. This framework covers 8 distinct attack vectors and 16 site categories, organized within an 8-axis factorial taxonomy designed to isolate the causal impact of specific attack design elements.

Our analysis of frontier agents reveals that critical-tier PII leakage rates range from 54% to 93% in the absence of privacy guidance. In contrast, leakage remains at 0% for the benign-twin baselines. This stark disparity confirms that data leakage is directly attributable to the attacks rather than being a result of incidental form-filling behavior.

While escalating prompt-level mitigation strategies leads to model-dependent reductions across the four agent families studied, these measures prove insufficient at the pooled level to consistently prevent the submission of critical PII. Most importantly, we identify a critical "detection–action gap": even when an independent LLM judge verifies that the agent’s reasoning process has correctly identified the site as suspicious, the agent still submits critical PII in 35.9% of sessions. This compares to a 66.1% submission rate when no suspicion is verbalized, representing a robust 30.2% gap consistent across all four model families.

These results indicate that defenses relying on the agent’s own recognition of an attack are targeting the wrong signal. Consequently, we argue for the implementation of output-level interception mechanisms for outbound submissions, which function independently of the agent’s internal reasoning loop.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...