Global News Digest

arXiv

SDR: Set-Distance Rewards for Radiology Report Generation

Title: SDR: Set-Distance Rewards for Radiology Report Generation

Abstract:

Reinforcement learning techniques leveraging verifiable rewards have significantly propelled the reasoning capabilities of vision-language models. Nevertheless, applying these methods to chest X-ray report generation presents a unique challenge: standard reward mechanisms, such as exact-match accuracy or step-level process evaluation, are ill-suited. This incompatibility arises because radiology reports comprise unordered and orthogonal findings, rather than following a linear causal reasoning chain.

To bridge this gap, we introduce a set-based approach. In this framework, each report is decomposed into individual sentences and processed by a frozen sentence transformer to create unordered sets of embeddings. We propose utilizing set-to-set distances between the embeddings of generated and reference reports as continuous, permutation-invariant rewards.

Our experiments across two distinct datasets and three vision-language models (Qwen3-VL-2B, Qwen3-VL-4B, and Gemma3-4B) demonstrate that post-training with these set-to-set distance rewards, implemented via GRPO, consistently surpasses both supervised fine-tuning and exact-match GRPO. Specifically, we observed average relative improvements of 6.80% in BERTScore, 7.82% in RadGraph F1, and 4.45% in CheXbert F1.

Furthermore, these set distances facilitate test-time best-of-$N$ selection. By scoring candidate outputs based on their proximity to training-report embeddings, we achieved superior performance compared to random selection. This advantage extended to our trained models as well as three closed-source large language models (Mistral-Small, Gemini-2.5 Flash-Lite, and GPT-4o-mini), yielding an average relative improvement of 16.4% on BERTScore.

When employed as a streaming signal, these rewards enable a more efficient method of test-time scaling. Pruning low-scoring candidates during generation reduces the total number of generated tokens by more than 50%, while maintaining the Findings quality equivalent to a full best-of-$N$ selection. Collectively, these findings position set-distance rewards as a unified signal for enhancing both post-training and test-time scaling in the domain of chest X-ray report generation. Our code is publicly available at \href{https://anonymous.4open.science/r/Set-Distance-Rewards-CXR-BFDA}{https://anonymous.4open.science/r/Set-Distance-Rewards-CXR-BFDA}.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.