arXiv

UniD$^3$: A Knowledge Graph-Enhanced RAG Framework for Drug-Disease Discovery and Reasoning

Title: UniD$^3$: Leveraging Knowledge Graphs to Enhance RAG for Drug-Disease Insight and Reasoning

Abstract

The systematic mapping of connections between drugs and diseases is a cornerstone of pharmaceutical discovery and drug repurposing. However, this process is frequently impeded by the fragmented nature and explosive expansion of biomedical research literature. Current datasets are often incomplete due to their dependence on exhaustive, manual curation, whereas approaches relying solely on Large Language Models (LLMs) are prone to hallucinations and lack robust evidence grounding. To address these challenges, we present UniD$^3$, a comprehensive framework that merges LLMs with Knowledge Graph-enhanced Retrieval-Augmented Generation (KG-RAG). This system is designed to extract, structure, and verify drug-disease information across three critical domains: Drug-Disease Matching (DDM), Drug Effectiveness Assessment (DEA), and Drug-Target Analysis (DTA).

UniD$^3$ utilizes the Llama 3.3-70B model to process a corpus of 157,849 PubMed articles. It employs a dual-stage methodology for constructing knowledge graphs, which integrates extraction at the paper level with consolidation at the KG level, focusing specifically on drug and disease entities. These constructed graphs facilitate the KG-RAG-driven generation of structured datasets. The output is rigorously evaluated using external benchmarks, fuzzy matching against curated resources, and assessments by medical clinicians.

The framework successfully generates six distinct knowledge graphs alongside large-scale datasets comprising 28,915 DDM instances, 15,042 DEA instances, and more than 4,000 QA pairs for DTA. External validation demonstrates robust performance, yielding F1 scores of 0.85–0.87 for DDM and DEA, and 0.82 for DTA. Furthermore, clinician reviews confirmed the high reliability of the data, resulting in an AUROC of 0.90. Models augmented with KG-RAG significantly surpass standalone LLMs in performance. Additionally, UniD$^3$ features a chatbot interface that allows for interpretable, citation-backed exploration of drug-disease dynamics. Ultimately, UniD$^3$ offers a scalable and adaptable solution for converting unstructured biomedical text into high-quality, structured knowledge, thereby facilitating AI-driven discovery, drug repurposing, and precision medicine initiatives.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...