arXiv

Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification

June 2, 2026 · Youngjoon Park · Original Source

Title: Leveraging Decision-Path Patterns as Tree Reliability Indicators: Path-Based Adaptive Weighting for Random Forest Classification

Abstract:

Random forests generate diverse models by creating randomized representations of the feature space for each individual tree. However, the standard uniform voting mechanism is unable to rectify errors in specific regions where incorrectly represented trees statistically dominate correctly represented ones. This occurs even when the ensemble as a whole contains sufficient accurate information, representing a type of reducible error that the current study aims to resolve. To address this, we introduce a method that utilizes the structural pattern of each tree’s decision path as an instance-adaptive reliability signal. This approach allows for the identification and differential weighting of more dependable trees. Since predictions in a random forest are determined by the root-to-leaf path a sample follows within each tree, assessing reliability at the path level provides a finer granularity of control than traditional tree-level weighting.

Our analysis demonstrates that this path-based signal accurately reflects the true reliability of each tree’s decision. Implementing this signal results in a statistically significant increase in accuracy compared to standard Random Forests across 36 binary classification benchmarks (Wilcoxon p < 0.0001). We also evaluated class-recall regression, a common failure mode associated with Random Forest correction techniques. Our method showed minimal bias, recording zero regressions in minority class recall and only one in majority class recall at the 0.2 percentage point threshold, suggesting a reduction in bias rather than a trade-off between classes. Furthermore, we quantified the reducible error that the method can exploit using only the fitted Random Forest. This estimate shows a strong correlation with the accuracy gains observed per dataset (Pearson r = +0.840, p < 0.0001). For the specific group of datasets identified as qualifying, the method achieved a mean accuracy improvement of +0.99 percentage points, with strict wins on all seven datasets (7/0/0). Additionally, an optional amplification mechanism further increased this gain to +1.48 percentage points.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC