Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression
Title: Enhancing Scaling Laws Through Weak-to-Strong Generalization in Random Feature Ridge Regression
Abstract:
It has become a standard practice in machine learning to utilize pre-trained models for data annotation, subsequently using these labeled datasets to train more powerful architectures. The concept of weak-to-strong generalization highlights the benefits of this two-step approach: a high-capacity "student" model is trained using imperfect labels generated by a less capable "teacher," yet the student achieves superior performance compared to its instructor. This study demonstrates that this performance gap can significantly alter the scaling laws governing test error. We focus on scenarios where both student and teacher models are trained using Random Feature Ridge Regression (RFRR). Our primary technical achievement is the derivation of a deterministic equivalent for the student’s excess test error when trained on teacher-generated labels. Using this equivalent, we identify specific conditions under which the student’s scaling law surpasses that of the teacher, revealing that such improvements occur in both bias-dominated and variance-dominated regimes. Notably, the student can reach the minimax optimal error rate irrespective of the teacher’s scaling behavior—even in cases where the teacher’s test error fails to decrease as the sample size increases.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



