arXiv

Benchmarking Recursive-Collapse Warning Claims Under Matched False-Positive Control

June 2, 2026 · David Mullett · Original Source

Title: Evaluating Claims of Recursive-Collapse Warnings with Controlled False-Positive Rates

Abstract: Before overt failure manifests, recursive systems may enter collapse-like states characterized by self-reinforcing amplification, persistent recursion, and shrinking diversity, which can obscure accelerating internal degradation. To address this, we present Loopzero, a benchmark framework bounded by specific claims, designed to test whether recursive failures adhere to a directional telemetry pattern defined by rising gain (G), recursive persistence (p), and decreasing diversity ($\delta$). While the claim boundary is formally specified in Lean, the Lean artifact itself does not validate real-world telemetry, benchmark integrity, or detector efficacy.

We assessed the framework’s utility using two frozen public-artifact benchmarks: a segmented public-markets dataset (covering Volmageddon in 2018 and the COVID MWCB in 2020) and an offline deterministic recommender replay based on MovieLens-25M. Evaluation was conducted under a strict, pre-registered contract limiting the false-positive rate (FP) to the interval [0.03, 0.07], ensuring all configurations operated within an identical alert budget. Neither the standard comparators nor Loopzero’s pre-registered quantile detector succeeded in achieving an accepted operating point.

Despite this non-acceptance, directional witness alignment was observed across both canonical benchmarks, though we disclose limitations related to adjacent horizons and row-level constraints. Additionally, digitized training-loop trajectories from LLMs, as reported by Shumailov et al. (2024), showed directional consistency with the predicted pattern, although matched-FP evaluation in this domain is postponed. Ultimately, this work contributes a reproducible and falsifiable benchmark framework for assessing recursive-collapse warning claims under an explicit alert-budget contract, treating the non-acceptance of hypotheses as a primary scientific result.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC