arXiv

UniD$^3$: A Knowledge Graph-Enhanced RAG Framework for Drug-Disease Discovery and Reasoning

June 2, 2026 · Qing Wang, Tianshi Liu, Minghao Zhou, Jialu Liang, Sen Guo, Guangyu Wang, Jing Su, Qianqian Song · Original Source

Title: UniD$^3$: Leveraging Knowledge Graphs to Enhance RAG for Drug-Disease Insight and Reasoning

Abstract

The systematic mapping of connections between drugs and diseases is a cornerstone of pharmaceutical discovery and drug repurposing. However, this process is frequently impeded by the fragmented nature and explosive expansion of biomedical research literature. Current datasets are often incomplete due to their dependence on exhaustive, manual curation, whereas approaches relying solely on Large Language Models (LLMs) are prone to hallucinations and lack robust evidence grounding. To address these challenges, we present UniD$^3$, a comprehensive framework that merges LLMs with Knowledge Graph-enhanced Retrieval-Augmented Generation (KG-RAG). This system is designed to extract, structure, and verify drug-disease information across three critical domains: Drug-Disease Matching (DDM), Drug Effectiveness Assessment (DEA), and Drug-Target Analysis (DTA).

UniD$^3$ utilizes the Llama 3.3-70B model to process a corpus of 157,849 PubMed articles. It employs a dual-stage methodology for constructing knowledge graphs, which integrates extraction at the paper level with consolidation at the KG level, focusing specifically on drug and disease entities. These constructed graphs facilitate the KG-RAG-driven generation of structured datasets. The output is rigorously evaluated using external benchmarks, fuzzy matching against curated resources, and assessments by medical clinicians.

The framework successfully generates six distinct knowledge graphs alongside large-scale datasets comprising 28,915 DDM instances, 15,042 DEA instances, and more than 4,000 QA pairs for DTA. External validation demonstrates robust performance, yielding F1 scores of 0.85–0.87 for DDM and DEA, and 0.82 for DTA. Furthermore, clinician reviews confirmed the high reliability of the data, resulting in an AUROC of 0.90. Models augmented with KG-RAG significantly surpass standalone LLMs in performance. Additionally, UniD$^3$ features a chatbot interface that allows for interpretable, citation-backed exploration of drug-disease dynamics. Ultimately, UniD$^3$ offers a scalable and adaptable solution for converting unstructured biomedical text into high-quality, structured knowledge, thereby facilitating AI-driven discovery, drug repurposing, and precision medicine initiatives.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC