BBOmix: A Tabular Benchmark for Hyperparameter Optimization of Unsupervised Biological Representation Learning
Title: BBOmix: A Tabular Benchmark for Hyperparameter Optimization of Unsupervised Biological Representation Learning
Abstract:
The proliferation of high-throughput sequencing technologies has resulted in the generation of massive, high-dimensional omics datasets. In this field, deep unsupervised learning models, specifically Autoencoders (AEs), are gaining traction for tasks involving dimensionality reduction and representation learning. Nevertheless, the performance of AEs is heavily dependent on architectural design and hyperparameter settings. Furthermore, unsupervised optimization typically depends on reconstruction loss, which does not always accurately reflect utility in subsequent downstream applications. Because exhaustive hyperparameter optimization (HPO) demands significant computational resources, scientists often default to suboptimal configurations. To make large-scale unsupervised HPO research more accessible, we present BBOmix, the inaugural open-source tabular benchmark dedicated to unsupervised representation learning using real-world biological data. This benchmark comprises 105,000 evaluations spanning seven multi-omics modalities and four AE architectures, derived from the TCGA and SCHC datasets. We analyze the relationship between reconstruction loss and performance on downstream tasks while conducting a comprehensive assessment of advanced HPO techniques, including single-fidelity, multi-fidelity, and transfer learning methods. This work establishes a strict baseline to guide future investigations into unsupervised biological representation learning.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC



