arXiv

FFR: Forward-Forward Learning for Regression

June 3, 2026 · Xinyang Liu, Xuanyu Liang, Shiqi Ding, Boyang Li, Zhiqiang Que, Jiayang Li, Guosheng Hu · Original Source

Title: FFR: Forward-Forward Learning for Regression

Abstract:

The Forward-Forward (FF) algorithm presents a computationally efficient and biologically plausible alternative to backpropagation (BP), relying on purely local, layer-wise optimization to train neural networks. However, while FF is inherently suited for classification tasks through the use of contrastive positive-negative sample pairs, applying it to regression introduces significant hurdles. Specifically, the continuous nature of target spaces eliminates the natural "opposites" required for contrastive learning, and the conventional goodness function fails to convey information regarding target magnitude or ordering.

To address these limitations, we introduce FFR (Forward-Forward for Regression), which we believe is the first framework to successfully extend FF to practical regression problems, achieving competitive performance across various real-world datasets. FFR incorporates three primary innovations:

Ordinal Competitive Goodness Function: This mechanism substitutes traditional contrastive pairs with competitive learning among partitioned neuron groups, guided by distance-aware ordinal supervision.
Stratified Ladder Architecture: This design enables shallow layers to perform coarse ordinal discrimination while deeper layers refine these outputs into fine-grained regression predictions. It also facilitates multi-scale feature aggregation to enhance collaboration between layers.
Hierarchical Prediction with Uncertainty Estimation: By employing multi-scale predictors, the system delivers robust predictions along with confidence estimates, effectively providing prediction uncertainty as a byproduct.

Extensive experiments demonstrate that FFR recovers an average of 98.6% of the accuracy achieved by BP across five real-world regression benchmarks. Furthermore, it significantly reduces resource consumption, lowering peak training memory usage to just 27% of BP’s requirements at a depth of 8 and 8% at a depth of 32. The per-iteration time is approximately 72% of that required by BP, and the method substantially outperforms all existing competitors that do not rely on backpropagation.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC