Ternary Decision Trees with Locally-Adaptive Uncertainty Zones
Title: Ternary Decision Trees Incorporating Locally-Adaptive Uncertainty Zones
Abstract: Standard decision trees tend to assign uniform confidence levels to data points, regardless of their proximity to split thresholds. To address this, we present ternary decision trees, a model that enhances every split node with a local uncertainty zone defined by a half-width parameter, delta. Within our decision-theoretic framework, the optimal width delta* is determined by solving a cost-minimization problem specific to each node. We establish four key formal properties: an accuracy decomposition formula, a sufficiency condition for improving decided accuracy, an exact efficiency metric (eta), and the asymptotic consistency of the margin method. Notably, efficiency eta is defined as the difference between decided accuracy and Uncertain accuracy (Acc_u), representing the accuracy gap between predictions made with certainty and those flagged as boundary-uncertain. Data instances falling within the uncertainty zone are assigned predictions through a weighted blend of the two child subtrees and are marked as boundary-uncertain.
We propose and assess five distinct methods for estimating delta: quality-plateau (based on the plateau width of the split criterion curve), class-overlap (measuring empirical overlap in class distributions), gain-ratio (comparing split quality to split entropy), node-bootstrap (assessing threshold variance via node-level resampling), and margin (utilizing SVM-inspired distance to the nearest cross-class training example). All five approaches leverage statistics already generated during standard CART split discovery, eliminating the need for external noise specifications. In evaluations across 71 of the 72 datasets from OpenML-CC18 using 5-fold cross-validation, all five methods employing probabilistic routing demonstrated significant superiority over standard CART in terms of decided accuracy (Wilcoxon signed-rank test, p < 0.001). The margin method emerged as the top performer in efficiency, yielding an accuracy gain of 0.104 per unit of flagging rate, securing victory on 42 out of 72 datasets, and requiring no hyperparameter tuning. Further analysis on Breiman synthetic benchmarks indicates that the margin method is self-calibrating on clean data. In tests on the mammography dataset, the node-bootstrap method improved decided accuracy by 0.71% by identifying 10.8% of cases as boundary-uncertain.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




