Global News Digest

arXiv

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

Title: ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

Abstract:

Real-world clinical practice involves far more than choosing from a fixed set of options; physicians must continuously integrate diverse information and make sequential, irreversible decisions amidst uncertainty. Traditional static benchmarks fail to capture this complexity, while current interactive medical benchmarks inherently sacrifice at least one critical dimension of realism. To address this, we introduce ClinEnv, an interactive benchmark designed to evaluate Large Language Models (LLMs) acting as attending physicians during real inpatient admissions. This framework operates under a novel paradigm we call Longitudinal Inpatient Simulation.

In ClinEnv, each clinical case is automatically organized into a structured sequence of decision stages. At each stage, the model is required to proactively query four specialized agents before finalizing its choices regarding medications, procedures, and diagnoses. The evaluation metrics assess both the outcome of the model’s decisions—verified through deterministic ontology-grounded matching—and the efficiency and quality of its information-gathering process.

Our evaluation of seven different models reveals that the best-performing model achieves only a 0.31 decision F1 score. Notably, there is a sharp disconnect between the quality of the final outcomes and the quality of the decision-making process. The primary difficulties lie in management decisions and later stages of patient care. While models are relatively reliable in recovering discharge diagnoses (F1 of 0.51), they struggle significantly with management actions (F1 of 0.17). Furthermore, as cases progress, models tend to issue redundant queries. ClinEnv renders this gap in information acquisition explicitly measurable, exposing flaws that remain hidden in evaluations focused solely on final outcomes.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers ā€œas much as possible,ā€ emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.