arXiv

A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization

June 2, 2026 · Ke Chen, Yifeng Wang, Hassan Almosapeeh, Haohan Wang · Original Source

Title: A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization

Abstract:

Current methods for prompt optimization typically rely on refining a single, static template, which limits their effectiveness in complex and dynamic user environments. While existing query-dependent strategies attempt to address this, they often depend on unstable textual feedback or opaque black-box reward models, resulting in weak and uninterpretable optimization signals. More critically, the field lacks a unified and systematic definition of prompt quality, leading to fragmented and unreliable evaluation metrics. To address these challenges, our approach introduces a performance-oriented, comprehensive, and systematic framework for evaluating prompts. We further develop and fine-tune an execution-free evaluator capable of predicting multi-dimensional quality scores directly from textual input. This evaluator guides a metric-aware optimizer that diagnoses specific failure modes and rewrites prompts in an interpretable, query-dependent manner. Our evaluator demonstrates superior accuracy in predicting prompt performance, and the resulting evaluation-instructed optimization pipeline consistently outperforms both static-template and query-dependent baselines across eight datasets and three backbone models. Ultimately, we propose a unified, metric-grounded perspective on prompt quality, showing that our optimization pipeline delivers stable, interpretable, and model-agnostic improvements across a variety of tasks.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC