Perspective on Bias in Biomedical AI: Preventing Downstream Healthcare Disparities
Title: Addressing Bias in Biomedical AI to Mitigate Future Healthcare Inequities
Healthcare disparities remain a persistent issue across socioeconomic lines, typically attributed to unequal access to screening, diagnostics, and therapeutic interventions. However, this perspective argues that critical biases originate much earlier in the pipelineāduring the phases of data collection and research prioritizationālong before these tools reach clinical practice. This issue is particularly acute in studies utilizing molecular and omics data.
While a significant volume of research focuses on gathering omics data, the associated demographic details are frequently omitted. When such information is provided, it often exposes profound biases. An automated review of 4,514 PubMed-indexed omics publications spanning 2015 to 2024, which assessed reporting across various demographic dimensions, found that overall disclosure is scant. Specifically, only 2.7% of the reviewed studies included ancestry or ethnicity data, while just 2.5% reported geographic origins.
Furthermore, an examination of major datasets routinely used for model training, including CellxGene and GEO, uncovered significant population bias, with data from individuals of European ancestry being predominant. As biomedical foundation models become integral to scientific discoveryāoperating on a paradigm where base models are pretrained on extensive datasets and subsequently reused for numerous downstream applicationsāthere is a risk that they will perpetuate or even exacerbate these early-stage biases. This can lead to cascading inequities that regulatory measures may struggle to fully correct.
To address these challenges, we advocate for a community-wide commitment to three foundational principles: Provenance, Openness, and Reliability through Evaluation Transparency. By adopting these standards, the limitations and biases inherent in biomedical AI can be made more visible to both developers and users, thereby fostering more informed decisions regarding model development, assessment, and deployment.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




