arXiv

GEO-Bench: Benchmarking Ranking Manipulation in Generative Engine Optimization

Title: GEO-Bench: Establishing a Standard for Evaluating Ranking Manipulation in Generative Engine Optimization

Abstract:

As large language models (LLMs) become the primary arbiters of rankings for user queries—sorting products, documents, and recommendations—the potential for manipulating these outputs has emerged as a significant threat to information integrity and fairness. While the field of generative engine optimization (GEO) has generated numerous manipulation techniques, the lack of standardized evaluation frameworks has left the relative efficacy and detectability of these methods largely unknown. Each study typically relies on unique datasets and metrics, preventing meaningful comparison. To address this gap, we introduce GEO-Bench, a unified benchmark designed to assess GEO ranking-manipulation attacks under a consistent protocol.

This benchmark integrates a diverse array of techniques, including black-box prompt-based attacks such as TAP and Zero-Shot, white-box gradient-based methods like STS, RAF, and StealthRank, alongside ten white-hat C-SEO strategies. We rigorously test these methods across five distinct datasets using a fixed open-weight ranker, Llama-3.1-8B-Instruct. Our evaluation framework employs a dual-axis scoring system: effectiveness, measured by NRG, Success@{\alpha}, and Promote@{\alpha}, and stealth, quantified by keyword violation rates and perplexity ratios.

The results reveal a distinct trade-off between effectiveness and stealth across various adversarial attacks. Notably, black-box content rewriting techniques were found to match or surpass gradient-based attacks in rank promotion while generating more fluent text. Furthermore, these methods demonstrated the ability to evade detection based on both keyword violations and perplexity in certain domains. Crucially, our findings indicate that the access model employed by an attacker does not reliably predict the strength of the attack. By standardizing datasets, attack implementations, and evaluation metrics, GEO-Bench facilitates the first direct comparison across different attack paradigms, thereby aiding in the advancement of robust detection methodologies.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...