arXiv

LLMs for Cardiovascular Risk Prediction from Structured Clinical Data

June 2, 2026 · Jeba Maliha, Md Rafiul Kabir · Original Source

Title: Leveraging Large Language Models for Cardiovascular Risk Assessment via Structured Clinical Data

Coronary artery disease (CAD) continues to rank among the primary causes of mortality worldwide, underscoring the urgent necessity for dependable predictive tools to facilitate early detection and risk evaluation. Although conventional machine learning algorithms have demonstrated strong efficacy with structured clinical datasets, large language models (LLMs) offer innovative opportunities to analyze medical information presented in natural language. This study introduces a hybrid framework designed to integrate structured clinical data with natural-language representations to enhance CAD prediction.

The research utilizes a public dataset comprising 1,190 patient records, each containing 11 distinct clinical attributes. Within this framework, structured variables are transformed into interpretable feature representations and synthetic clinical narratives through LLMs. To ensure data integrity, a validation pipeline conducts reverse extraction of clinical variables from these narratives, calculating a consistency score against the original records. This process yielded an average fidelity of 94.61%.

The study further assesses four standard machine learning models, comparing their performance against LLM-based classification methods under both zero-shot and few-shot prompting scenarios. The investigation employs two specific LLMs: GPT and Gemini. The experimental outcomes indicate that Random Forest delivers the highest accuracy rates. However, despite this statistical edge, LLM-based classification holds significant value for practical clinical applications. Because LLMs process natural language patient descriptions directly, they allow for the preservation of privacy regarding sensitive numerical data, including precise laboratory values, blood pressure measurements, and diagnostic codes. These results imply that merging structured clinical data with LLM-generated narratives opens new avenues for developing hybrid clinical prediction systems.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC