You Only Train Once: Differentiable Subset Selection for Omics Data
Title: You Only Train Once: Differentiable Subset Selection for Omics Data
Abstract: Identifying concise yet informative gene subsets from single-cell transcriptomic data is a critical step for enhancing interpretability, facilitating biomarker discovery, and enabling cost-efficient profiling. Yet, current feature selection methods often suffer from a disconnect between selection and prediction, typically functioning through multi-stage pipelines or relying on post hoc attribution techniques. To address this, we introduce YOTO (You Only Train Once), a unified, end-to-end framework that simultaneously executes discrete gene subset identification and prediction within a single differentiable architecture. In this system, the prediction objective steers the selection of specific genes, while the resulting subsets define the predictive representation, creating a closed feedback loop. This dynamic allows the model to continuously refine its selection criteria and predictive capabilities throughout the training process. Distinct from prior methods, YOTO imposes sparsity constraints such that only selected genes participate in inference, thereby removing the requirement for separate downstream classifiers. Furthermore, a multi-task learning strategy enables the model to acquire shared representations across related objectives, allowing partially labeled datasets to mutually inform the learning process. This approach facilitates the discovery of gene subsets that generalize across tasks without necessitating extra training phases. Our evaluation of YOTO on two prominent single-cell RNA-seq datasets reveals consistent superiority over state-of-the-art baselines. These findings indicate that sparse, end-to-end, multi-task gene subset selection not only boosts predictive accuracy but also generates compact, biologically meaningful gene subsets, thereby advancing the fields of single-cell analysis and biomarker discovery.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




