arXiv

FLARE: Fine-Grained Diagnostic Feedback for LLM Code Refinement

June 3, 2026 · Yinsheng Yao, Hongxiang Zhang, Weixi Tong, Tianyi Zhang · Original Source

Title: FLARE: Providing Fine-Grained Diagnostic Feedback to Enhance LLM Code Refinement

Large language models (LLMs) frequently produce code containing errors. Current approaches typically depend on feedback mechanisms—such as test failures and self-generated critiques—to iteratively improve the code. However, these signals are often either too broad or too abstract to effectively pinpoint the specific locations requiring correction. To address this limitation, we introduce Flare, an iterative framework that utilizes a lightweight diagnostic model to generate line-level suspiciousness metrics for precise bug localization and code refinement.

Acknowledging the inherent uncertainty in diagnostic predictions, Flare explores the top-k most suspicious code regions and identifies the optimal candidate based on execution results. Our experiments, conducted on LiveCodeBench and BigCodeBench using five base LLMs, demonstrate that Flare surpasses the strongest existing baseline by an absolute margin of 1.72% to 7.42%, even when operating without candidate search (k=1). Moreover, expanding the search to include 10 candidates results in an average performance gain of 8.50% compared to scenarios with no candidate search. When assessed independently, our lightweight diagnostic model outperforms recent fault localization techniques, confirming its capability to deliver dependable, fine-grained guidance for code improvement.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC