BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps
Title: BEAT: Uniform Temporal Steps for Tokenizing and Generating Symbolic Music
Abstract
Adapting music tokenization to align with the standard architecture of language models presents a significant challenge, particularly given the varied symbolic formats available for musical representation, such as sequences, grids, and graphs. Currently, the dominant approach involves representing symbolic music as a sequence of discrete events—such as pitch, onset, duration, or combined note events. While this method is logical and has demonstrated success within Transformer-based frameworks, it handles the regularity of musical time only implicitly. Because individual tokens can cover varying durations, the resulting temporal progression is often non-uniform.
In contrast, this study explores an alternative tokenization strategy centered on uniform-length musical steps, such as beats, as the fundamental unit. Our method encodes all events occurring at the same pitch within a single time step into a single token, while explicitly organizing these tokens by time step. This approach is analogous to a sparse encoding of a piano-roll structure. We assess the efficacy of this proposed tokenization through music continuation and accompaniment generation tasks, benchmarking it against prevailing event-based techniques. The findings indicate that our method yields superior musical quality and enhanced structural coherence. Furthermore, supplementary analyses demonstrate that this tokenization scheme offers greater efficiency and a more robust ability to capture long-range dependencies.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





