arXiv

DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering

June 2, 2026 · Qing Wang, Bo Li, Jialu Liang, Daling Shi, Bob Zhang, Qianqian Song · Original Source

DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering

Original: arXiv:2606.01434v1 Announce Type: new

Abstract: In the high-stakes domain of drug-information question answering, the provenance of cited facts is as critical as the facts themselves, as hallucinations can severely mislead clinical decision-making. To address this, we introduce DrugClaw, a multi-agent retrieval-augmented system. This framework utilizes a reflection-driven state-machine workflow to query a registry of drug and pharmacovigilance capabilities, delivering answers that are strictly grounded in primary regulatory documents or peer-reviewed records. Additionally, we present DrugAudit, a comprehensive benchmark comprising 3,772 items. This authority-aware dataset features an evaluation panel that assesses upstream-of-gold source matching, token-level semantic snippet overlap, and citation faithfulness. The evaluation employs a dual-judge LLM-as-judge protocol, achieving an inter-judge kappa coefficient of 0.88, indicating almost-perfect agreement.

In comparative analyses across DrugAudit and drug-specific subsets of MedQA (751 items) and PubMedQA (512 items), DrugClaw achieved the top rank in every metric of the primary results table. Specifically, it led in the composite Evidence Index under both judges, judge-mediated answer correctness, primary-source rate (0.918, representing a 10.1 percentage point improvement over the next-best model), and faithfulness (0.887, a 5.9 percentage point gain). Furthermore, DrugClaw secured scores of 0.920 on MedQA and 0.693 on PubMedQA.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Global News Digest

DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering

DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering

Related Articles

Law’s Billable Hour Is Being Shredded by AI

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Russia Says It Found Foreign Spyware on Top Officials’ Phones