arXiv

PsychoPass: Geometric Profiling of Multi-Turn Adversarial LLM Conversations

Title: PsychoPass: Geometric Profiling of Multi-Turn Adversarial LLM Conversations

Abstract

Multi-turn jailbreak attempts against large language models (LLMs) expose a critical vulnerability in existing safety mechanisms: while defensive guardrails typically analyze isolated turns, adversarial strategies manifest as continuous trajectories spanning entire dialogues. To address this, we advocate a paradigm shift from static content analysis to dynamic modeling, treating conversations as paths within representation space to determine if adversarial intent is geometrically encoded from the outset. We present PsychoPass, a novel framework that derives geometric features from conversation trajectories in embedding space to forecast potential attacks prior to the generation of harmful material. While our initial geometric features yield near-perfect accuracy in naive classifiers, this performance is primarily driven by the inclusion of the total number of turns as a variable. Upon controlling for this confound, we observe a persistent, albeit subtler, geometric signal. Notably, classification efficacy remains stable regardless of the specific encoder employed. Importantly, this predictive signal emerges early in the interaction; even short prefixes allow for detection rates significantly above chance and more reliably than standard baseline guardrails. Our theoretical analysis elucidates these observations through a decomposition of length and shape, establishes a detection bound tied to prefix length, and confirms encoder invariance. Collectively, these findings demonstrate that adversarial exchanges imprint an early, representation-robust geometric signature, making them viable targets for real-time online monitoring.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...