arXiv

LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling

Title: LDARNet: A DNA Adaptive Representation Network Utilizing Learnable Tokenization for Genomic Modeling

Abstract:

While genomic foundation models are increasingly mirroring the architectures of large language models, they predominantly depend on static tokenization methods like $k$-mers, Byte Pair Encoding (BPE), or individual nucleotides. These fixed schemes impose arbitrary sequence divisions that can potentially mask biologically significant structural features. To address this, we introduce LDARNet, a hierarchical genomic foundation model comprising 120 million parameters. This architecture adapts the dynamic chunking mechanism of H-Net—originally designed for autoregressive generation—to the realm of masked language modeling. LDARNet integrates BiMamba-2 state-space layers with local attention, bidirectional routing, and a ratio-based regularizer to facilitate unsupervised, adaptive token boundary formation.

Evaluated across 27 tasks from the Genomic Benchmarks and Nucleotide Transformer suites, LDARNet secured 11 out of 18 victories among compact models (defined as those with fewer than 300 million parameters). It also achieved state-of-the-art performance on five histone modification tasks, surpassing models that are up to 20 times larger in size. A controlled experiment matching FLOPs identified learned routing as the primary driver of these improvements: at equivalent computational costs, the model’s learned boundaries outperformed fixed-grid boundaries by as much as 14 percentage points on histone tasks. Furthermore, nucleotide-resolution analysis revealed that these unsupervised learned boundaries correspond with canonical promoter motifs and splice junctions, offering a clear biological interpretation for the efficacy of adaptive tokenization in genomic foundation models.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Dimon and SpaceX Executives to Pitch IPO to Clients
Bloomberg

Dimon and SpaceX Executives to Pitch IPO to Clients

JPMorgan Chase CEO Jamie Dimon and SpaceX executives are pitching IPO details to clients.

Financial Times

Europe is finally flexing its innovation muscles

The EU’s new tech sovereignty package signals a positive shift from defensive regulation to proactive innovation, markin...

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries
Bloomberg

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries

Apollo’s Zelter expects high-grade debt sales to surpass US Treasuries. He anticipates investment-grade debt outperformi...

EU Insurance Watchdog Warns on Loan Risks
Bloomberg

EU Insurance Watchdog Warns on Loan Risks

EIOPA warns insurers to closely monitor loan risks, though initial reports lack specific details on the nature or scope ...

Glazer Family Members Said to Study Manchester United Stake Sale
Bloomberg

Glazer Family Members Said to Study Manchester United Stake Sale

Reports indicate the Glazer family is evaluating a potential sale of their Manchester United stake, with family members ...

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines
Bloomberg

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

Ares’ Blair Jacobson argues that private credit headlines misrepresent reality, highlighting a disconnect between media ...