Can I Take Another Dose? Evaluating LLM Decision-Making Under Temporal Uncertainty in OTC Dosing QA
Title: Assessing LLM Decision-Making Under Temporal Uncertainty in OTC Dosing QA: Can I Take Another Dose?
Abstract:
As large language models (LLMs) become more prevalent in addressing routine health inquiries, their ability to safely advise on whether a user may take an additional dose of over-the-counter (OTC) medication is gaining attention. Despite the critical safety implications of this scenario, it has received limited scrutiny in current medical question-answering (QA) evaluations. Accurate responses in this domain demand complex cognitive tasks, including the monitoring of dosage intervals, the calculation of cumulative intake over a rolling 24-hour period, adherence to specific product-label restrictions, and the management of partial medication histories.
To address this gap, we present DOSEBENCH, a specialized benchmark comprising 81 carefully curated OTC dosing scenarios centered on adult usage of acetaminophen and ibuprofen, complete with manually verified gold-standard references. We conducted a comprehensive evaluation of four distinct LLMs, generating a total of 1,620 responses through repeated testing. Our assessment utilized metrics designed to measure decision accuracy, response consistency, the verifiability of explanations, various failure modes, and indicators related to model confidence.
The analysis reveals that LLMs often encounter difficulties with rolling-window logic and cases involving ambiguous conditions. Furthermore, the study highlights a concerning trend: responses that appear stable or are delivered with high confidence may nonetheless breach established dosing constraints. These outcomes indicate that OTC dosing QA serves as a precise and practical framework for testing temporal reasoning, constraint compliance, and the handling of safety-critical uncertainty within medical AI systems.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





