Global News Digest

arXiv

Consistent and Distinctive: LLM Benchmark Efficiency via Maximum Independent Set Prompt Selection on Similarity Graphs

Title: Consistent and Distinctive: LLM Benchmark Efficiency via Maximum Independent Set Prompt Selection on Similarity Graphs

Abstract

Assessing large language models (LLMs) using extensive benchmark suites is often prohibitively costly and time-intensive. To address this, we introduce a graph-driven framework for prompt selection that represents each benchmark as a similarity graph. In this structure, nodes correspond to prompts, and edges are established when the embedding-space distance between them exceeds a user-defined threshold. By employing Maximum Independent Set (MIS) algorithms, the framework identifies a subset of prompts that is both maximally diverse and free from redundancy.

We tested four distinct MIS solvers—CPLEX, GREEDY, Online-MIS, and ReduMIS—across a comprehensive experimental matrix involving six embedding models, three distance metrics, six percentile thresholds, and four major benchmarks (GPQA, IFEval, MMLU-Pro, and Omni-MATH). This evaluation spanned 66 different LLMs. Our primary hypothesis posits that conducting repeated selections with varying random seeds produces consistent LLM rankings, which may also diverge from those derived using the full benchmark baseline. This hypothesis is robustly supported by our findings: Kendall’s $W$ exceeded 0.90 in 99.2% of stochastic configurations, with a mean value of $0.997 \pm 0.008$. Furthermore, at elevated percentile thresholds, the selected subsets achieved an average prompt reduction of 25–48%.

Deviations in ranking from the full-benchmark baseline ($\rho < 0.95$) were observed in only 15.95% of configurations. These discrepancies were primarily concentrated at lower thresholds ($p_{10}$–$p_{20}$) and specific benchmarks (GPQA and IFEval), highlighting overly dense graphs as the main cause of performance failure.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.