ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents
Title: ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents
Abstract:
Real-world clinical practice involves far more than choosing from a fixed set of options; physicians must continuously integrate diverse information and make sequential, irreversible decisions amidst uncertainty. Traditional static benchmarks fail to capture this complexity, while current interactive medical benchmarks inherently sacrifice at least one critical dimension of realism. To address this, we introduce ClinEnv, an interactive benchmark designed to evaluate Large Language Models (LLMs) acting as attending physicians during real inpatient admissions. This framework operates under a novel paradigm we call Longitudinal Inpatient Simulation.
In ClinEnv, each clinical case is automatically organized into a structured sequence of decision stages. At each stage, the model is required to proactively query four specialized agents before finalizing its choices regarding medications, procedures, and diagnoses. The evaluation metrics assess both the outcome of the modelās decisionsāverified through deterministic ontology-grounded matchingāand the efficiency and quality of its information-gathering process.
Our evaluation of seven different models reveals that the best-performing model achieves only a 0.31 decision F1 score. Notably, there is a sharp disconnect between the quality of the final outcomes and the quality of the decision-making process. The primary difficulties lie in management decisions and later stages of patient care. While models are relatively reliable in recovering discharge diagnoses (F1 of 0.51), they struggle significantly with management actions (F1 of 0.17). Furthermore, as cases progress, models tend to issue redundant queries. ClinEnv renders this gap in information acquisition explicitly measurable, exposing flaws that remain hidden in evaluations focused solely on final outcomes.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




