arXiv

Caught in the Act(ivation): Toward Pre-Output and Multi-Turn Detection of Credential Exfiltration by LLM Agents

Title: Intercepting Credential Theft: A Framework for Pre-Output and Multi-Turn Detection in LLM Agents

Abstract

Large Language Model (LLM) agents frequently expose sensitive credentials within context windows that also contain untrusted, retrieved data. This overlap creates a vulnerable pathway for indirect prompt injections, which can manipulate the model into exfiltrating these credentials. To address this security failure, we evaluate three distinct defensive strategies. First, we investigate the efficacy of activation probes in identifying credential access prior to the generation of output tokens. Second, we develop honeytokens derived from format-specific character models and refine detection precision using split conformal prediction. Third, we frame multi-turn exfiltration as a cumulative information-flow issue, monitoring an estimated leakage budget across successive conversation turns.

Our controlled experiments on open-weight models demonstrate that activation features can distinguish between benign queries and those seeking credentials with high accuracy, even when subjected to held-out encoding transformations. Additionally, in a synthetic multi-turn test suite, cumulative accounting methods successfully identified attacks that single-turn detectors failed to catch. While these findings are preliminary—given that the multi-turn benchmark is proprietary and limited in scale, the activation approach demands white-box model access, and the information estimator serves as a practical indicator rather than a strict upper bound—they strongly suggest that robust defenses against credential exfiltration must integrate pre-output monitoring, calibrated canary detection, and temporal leakage accounting, rather than relying exclusively on text-level output filters.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

The Do’s and Don’ts of Buying Used Tech Gadgets
New York Times

The Do’s and Don’ts of Buying Used Tech Gadgets

Refurbished tech offers a cost-effective alternative amid component shortages and inflated prices. This guide outlines e...

Who is Elon Musk and what is his net worth?
BBC News

Who is Elon Musk and what is his net worth?

Elon Musk, CEO of Tesla and SpaceX, became the first person to surpass a $500 billion net worth in October 2025. His wea...

AI Boom Propels China Optical Maker to Top Weighting on CSI 300
Bloomberg

AI Boom Propels China Optical Maker to Top Weighting on CSI 300

Driven by surging AI demand, a Chinese optical maker has reached the highest weighting in the CSI 300 index.

AI Bubble 'Something to Look At,' BNP's Huynh Says (Video)
Bloomberg

AI Bubble 'Something to Look At,' BNP's Huynh Says (Video)

BNP Paribas’ Huynh describes the AI bubble as “something to look at,” signaling cautious interest in the sector’s potent...

SoftBank’s PayPay to Buy T&D’s Life Insurer for $840 Million
Bloomberg

SoftBank’s PayPay to Buy T&D’s Life Insurer for $840 Million

PayPay is acquiring T&D Holdings’ life insurer for $840 million, shortly after its historic $879.8 million Nasdaq IPO.

Goldman Sachs CEO David Solomon on Running a Bank in the Age of AI | Odd Lots
Bloomberg

Goldman Sachs CEO David Solomon on Running a Bank in the Age of AI | Odd Lots

Goldman Sachs CEO David Solomon discusses integrating AI into banking operations. He explores how artificial intelligenc...