arXiv

Annotation-Informed Block-Sparse Bayesian Modeling for cis-Expression Prediction

Title: Leveraging Annotation Data for Block-Sparse Bayesian Modeling to Predict cis-Expression

Accurate modeling of local regulatory architecture is the cornerstone of genotype-based cis-expression prediction. To address this, we introduce the block-sparse Bayesian sparse linear mixed model (bsBSLMM). This method extends the existing Bayesian sparse linear mixed model (BSLMM) framework by integrating two key innovations: spike-and-slab sparsity defined by linkage disequilibrium (LD) blocks and a prior for SNP inclusion that is informed by transcription start site (TSS) locations.

In an evaluation involving 23,098 genes derived from GEUVADIS lymphoblastoid cell lines of European ancestry, bsBSLMM outperformed several established methods, including BSLMM, LASSO, BLUP, TIGAR elastic net, and TIGAR Dirichlet-process regression. Under consistent evaluation criteria, bsBSLMM successfully retained a higher number of predictable genes. When compared directly to BSLMM, bsBSLMM demonstrated superior prediction performance on held-out data for the majority of shared genes. These improvements were primarily attributed to the incorporation of LD-block sparsity, with additional gains provided by the TSS-informed prior.

The biological relevance of the variants selected by bsBSLMM was evident in their stronger enrichment within regulatory regions, specifically GM12878 DNase and H3K27ac sites, compared to variants chosen by the standard BSLMM. Furthermore, in transcriptome-wide association study (TWAS) analyses, bsBSLMM not only recovered known inflammatory bowel disease signals, such as those linked to IL23R, but also identified additional genome-wide significant genes that BSLMM failed to detect.

The robustness of these findings was confirmed through independent validation in the Louisiana Osteoporosis Study. This analysis replicated the increased prediction yield across diverse ancestries and uncovered biologically significant bone mineral density pathways in subsequent TWAS and gene set enrichment analyses. Collectively, these results indicate that integrating LD-block structures and biologically grounded SNP priors significantly enhances both cis-expression prediction accuracy and the discovery power of downstream TWAS.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...