arXiv

Early Detection of Alzheimer's Disease Using Explainable Machine Learning on Clinical Biomarkers: A Multi-Class Classification Study Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) Dataset

June 4, 2026 · Afshan Hashmi · Original Source

Title: Leveraging Explainable Machine Learning and Clinical Biomarkers for Early Alzheimer’s Detection: A Multi-Class Analysis of the ADNI Dataset

Abstract

Background: Alzheimer’s disease (AD) currently impacts more than 55 million individuals globally. There is a pressing, unmet need for detection methods that are both accurate and interpretable to distinguish between normal cognition (NC), mild cognitive impairment (MCI), and AD based on standard clinical evaluations.

Methods: This study developed an XGBoost classifier designed for three-class identification, utilizing eight clinical variables sourced from the Alzheimer's Disease Neuroimaging Initiative (ADNI): Mini-Mental State Examination (MMSE), Clinical Dementia Rating (CDR) Global, CDR Sum of Boxes (CDR-SB), Montreal Cognitive Assessment (MoCA), Functional Activities Questionnaire (FAQ), age, sex, and education level. To handle class imbalance, Synthetic Minority Over-sampling Technique (SMOTE) was applied, while hyperparameters were tuned via Optuna over 50 trials. Model performance was assessed using macro AUC-ROC (with 95% confidence intervals derived from 1,000 bootstrap iterations), macro F1 score, balanced accuracy, and Cohen’s kappa. Additionally, SHAP values were employed to provide explainability at the feature level.

Results: The analysis included 1,641 subjects at baseline, distributed as 608 with NC, 767 with MCI, and 266 with AD. In five-fold cross-validation, the model achieved a mean macro AUC of 0.983 (SD 0.007), an accuracy of 0.944 (SD 0.006), and a macro F1 of 0.929 (SD 0.008). Performance on the independent test set (n = 247) yielded a macro AUC of 0.982 (95% CI: 0.965–0.995), accuracy of 0.943, balanced accuracy of 0.932, macro F1 of 0.927, and Cohen’s kappa of 0.909. SHAP interpretation highlighted that CDR Global was the primary predictor for distinguishing NC and MCI, whereas CDR-SB and MMSE were the key drivers for classifying AD.

Conclusion: An explainable machine learning approach, trained on routine clinical metrics, demonstrates near-perfect capability in three-class Alzheimer’s detection. The SHAP analysis confirms clinically plausible, class-specific feature importance, reinforcing the model's validity. Subsequent research aims to integrate speech biomarkers into this framework to enable multimodal detection.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC