Targeted Remasking: Replacing Token Editing with Token-to-Mask Refinement in Discrete Diffusion Language Models
Title: Targeted Remasking: Replacing Token Editing with Token-to-Mask Refinement in Discrete Diffusion Language Models
Abstract:
Discrete masked diffusion language models, including LLaDA, produce text via an iterative denoising process that gradually substitutes mask tokens with predicted values. While LLaDA2.1 attempted to speed up generation through a Token-to-Token (T2T) editing mechanism—directly swapping out committed tokens identified as likely errors—this approach exhibits several critical flaws. Specifically, T2T editing conflates error identification with replacement, potentially contaminates the generation context with incorrect tokens, and creates a discrepancy between training and inference noise. This mismatch arises because systematic errors generated by the model during inference do not align with the random perturbations encountered during training.
To address these issues, we introduce Token-to-Mask (T2M) remasking, a training-free modification that serves as a direct substitute for T2T editing. Instead of replacing tokens, T2M reverts suspected erroneous tokens to the mask state, enabling the diffusion process to re-predict them within a cleaner context. We formulate three distinct error detection strategies—based on probability, trigger mirroring, and temporal differences—and provide a unified theoretical framework demonstrating that T2M remasking enhances context purity, transforms systematic inference errors into the native mask noise format, and facilitates joint multi-position optimization through delayed commitment.
Extensive evaluations across 12 benchmarks covering instruction following, coding, mathematics, reasoning, and knowledge tasks indicate that T2M generally boosts performance, particularly in scenarios demanding precise token-level accuracy. The most significant improvement was observed in mathematics, where CMATH scores increased by 5.92%. An error analysis of CMATH results highlights that the primary failure mode is "last-mile token corruption," wherein correct reasoning leads to a corrupted final answer. T2M successfully rectifies 59.4% of these specific cases.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





