arXiv

Model Multiplicity and Predictive Arbitrariness in Recidivism Risk Assessment

June 2, 2026 · Ashwin Singh, Carlos Castillo · Original Source

Title: Model Multiplicity and Predictive Arbitrariness in Recidivism Risk Assessment

Abstract:

Decision-making processes involving predictions about individual futures often face the challenge of inherent noise, which can result in the existence of multiple models that achieve comparable accuracy. When these models yield conflicting forecasts for the same person, it introduces significant concerns regarding the fairness and consistency of high-stakes decisions. This paper investigates the theoretical and practical magnitude of such arbitrariness and explores methods to mitigate it within risk assessment frameworks.

We examine these issues through an analysis of a machine learning-driven decision support system for recidivism risk assessment that has been operational for more than 15 years. To begin, we developed a dataset comprising thousands of inmate release cases by converting complex legal statutes into an algorithmic framework for labeling post-release outcomes as either recidivist or non-recidivist. Leveraging this data, we trained interpretable models that not only enhanced predictive accuracy but also narrowed error-rate disparities across demographic groups. Furthermore, these models were designed to ensure that evidence of rehabilitative progress resulted in lower risk scores.

Our investigation into predictive multiplicity involved two key steps: first, we established a tight lower bound on the expected level of agreement among any finite collection of models across a dataset; second, we assessed how structural variations—such as differences in model coefficients—within that collection manifested as predictive multiplicity, defined here as divergent predictions for identical individuals.

Our experimental results demonstrate that the presence of numerous models with similar accuracy and comparable error-rate disparities does not inevitably lead to severe predictive multiplicity. In practice, models with similar performance levels often show significantly higher agreement than the conservative limits suggested by worst-case theoretical bounds. Consequently, we identify a straightforward policy strategy—assigning each inmate the lowest risk score generated by the set of equally accurate models—as an effective solution to minimize predictive arbitrariness.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC