Prediction Under Imperfect Compression: A Theory of Approximate MDL
Title: Prediction Under Imperfect Compression: A Theory of Approximate MDL
Abstract: Minimum Description Length (MDL) operationalizes Occam’s razor by minimizing the sum of the model description length and the conditional data description length, expressed as $L(\mathrm{model})+L(\mathrm{data} \ | \ \mathrm{model})$. In the context of sequential prediction, the MDL approach iteratively chooses a model that minimizes the objective score based on the observed history to forecast the subsequent step. While classical theory demonstrates that exact optimization of the MDL objective yields robust compression guarantees conducive to accurate prediction, real-world machine learning applications typically rely on approximate optimization. To address this discrepancy, this study investigates a fundamental question: Which types of approximation and regularization allow approximate MDL to maintain reliable sequential prediction? We provide a rigorous characterization for this scenario. Our analysis proves that for any approximation introducing an additive slack $C$ to the generalized balanced MDL objective, $\lambda\cdot L(\mathrm{model})+L(\mathrm{data} \ | \ \mathrm{model})$, the cumulative expected squared prediction error remains finite for all $\lambda\ge1$. The proof for $\lambda>1$ utilizes an affinity-telescoping argument, whereas the boundary case of $\lambda=1$ is established via a likelihood-ratio stopping argument rooted in exact static MDL bounds. These findings confirm that classical MDL regularization is resilient to any constant additive optimization error. Moreover, we demonstrate that our characterization of the approximate MDL framework is tight: when $0<\lambda<1$, reliable prediction fails under additive approximation, but succeeds under multiplicative approximation, thereby proving that additive approximation is both necessary and sufficient.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





