Global News Digest

arXiv

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Title: Enhancing LLM Capability Assessment Through Evidence-Calibrated Query Clustering

Abstract: Query clustering facilitates capability-aware large language model (LLM) evaluation by grouping queries according to their shared, underlying capability requirements. However, traditional clustering approaches, which depend heavily on semantic taxonomies or embeddings, frequently fall short in capturing these latent demands. This failure stems from a disconnect between surface-level semantic meaning and the actual performance characteristics of the model. To address this, we introduce ECC, an algorithm designed to bridge the gap between superficial semantics and latent capability needs. ECC achieves this by refining initial semantic embeddings with limited posterior model comparisons. The method defines each cluster via a capability profile governed by a Bradley-Terry model and employs trainable mixture weights to handle queries that require multiple capabilities. This approach jointly learns a flexible, capability-aware clustering framework that enables the specific inference of LLM capabilities for individual queries. Comprehensive quantitative and qualitative assessments reveal that ECC substantially enhances the quality of LLM capability rankings. It surpasses human-labeled and embedding-based baselines by an average margin of 17.64 and 18.02 percentage points, respectively, and demonstrates strong utility in downstream applications such as query routing.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.