arXiv

MeshTok: Efficient Multi-Scale Tokenization for Scalable PDE Transformers

June 4, 2026 · Yanshun Zhao, Xiaoyu Peng, Jiamin Jiang, Congcong Zhu, Jingrun Chen · Original Source

Title: MeshTok: Efficient Multi-Scale Tokenization for Scalable PDE Transformers

Abstract:

Standard patch-based Transformers typically rely on uniform spatial partitions, which allocate computational resources evenly across the entire domain regardless of local feature complexity. This rigid tokenization approach is fundamentally constrained in its capacity to efficiently represent and solve complex partial differential equations (PDEs). To overcome these limitations, we introduce MeshTok, a framework for tokenization and sequence modeling inspired by adaptive mesh refinement (AMR). This technique selectively refines spatial areas characterized by sharp gradients, transient dynamics, or multiscale structures, thereby producing a heterogeneous collection of multiscale tokens anchored to a fixed simulation grid. These tokens are integrated into a single Transformer sequence, allowing the model to simultaneously grasp coarse-grained global contexts and fine-grained local nuances without the need for specialized architectural modules. While adaptive refinement leads to a moderate rise in the number of tokens, it facilitates a more precise distribution of computational power toward regions with significant physical information—a strategy we interpret as a practical inductive bias rather than a formal guarantee of optimality. Comprehensive experiments across various PDE families and benchmark datasets reveal that MeshTok consistently enhances the balance between efficiency and accuracy relative to uniform-grid baselines. These findings highlight adaptive multiscale tokenization as a scalable and generalizable principle for neural PDE modeling. The code is publicly accessible at https://github.com/SCAILab-USTC/MeshTok.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC