Structured Visual Evidence Decomposition for Evidence-Grounded Multimodal Screening of Obstructive Sleep Apnea-Hypopnea Syndrome
Title: Decomposing Structured Visual Evidence for Evidence-Based Multimodal Screening of Obstructive Sleep Apnea-Hypopnea Syndrome
Abstract
Successful pre-polysomnography screening for obstructive sleep apnea-hypopnea syndrome (OSAHS) necessitates the integration of visible craniofacial and neck indicators with established clinical risk factors. Relying on general-purpose multimodal foundation models to make direct medical yes/no determinations often results in outputs that are unstable and poorly calibrated. To address this, we introduce EviOSAHS, a multimodal reasoning framework grounded in evidence that decouples the acquisition of image-only anatomical data from the final clinical decision-making process.
The system processes each frontal facial image by breaking it down into seven specific anatomical queries targeting the neck, chin, mouth, face and neck fat, lower jaw, midface, and nose. The resulting visual data is transformed into structured evidence cards, which document the target anatomy, visibility status, risk direction, evidence strength, confidence levels, and a brief summary. Only at the final stage are these cards merged with a sanitized clinical profile, allowing a large language model to conduct a balanced binary screening adjudication.
In evaluations involving a cohort of 642 subjects, normal cases were classified as screening-negative, while those with mild, moderate, or severe OSAHS were classified as screening-positive. EviOSAHS demonstrated superior performance compared to clinical-only prompting, direct multimodal prompting, and naive two-stage pipelines under a unified protocol, achieving an accuracy of 88.47%, sensitivity of 94.86%, an F1-score of 93.74%, and a false-negative rate of just 5.14%. Ablation studies indicated that the high-sensitivity operating point was critically dependent on the seven-question visual decomposition method and the balanced nature of the final adjudication.
Furthermore, an audit of 4,494 visual outputs at the question level revealed a 100% structured parse rate and a 93.88% rate of high visibility. While EviOSAHS offers an auditable, high-sensitivity workflow for binary pre-polysomnography OSAHS screening, it is intended to function as a triage assistant rather than a definitive diagnostic tool. Before clinical deployment, prospective validation, external testing, and the implementation of calibrated operating-point controls are required.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




