Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher
Title: Trust Functions: Achieving Near-Lossless Weak-to-Strong Generalization Through Strategic Reliance on Weak Teachers
Abstract: Weak-to-strong generalization explores methods for enhancing a robust student model using guidance from a less capable teacher, particularly in scenarios where accurate labels are limited. We frame this challenge primarily as a data selection task, focusing on the critical need to distinguish which weak labels possess sufficient reliability to act as effective training signals. To solve this, we propose trust functions that calculate a scalar trust score for every weak label, allowing the system to filter out unreliable supervision. In diverse areas such as world knowledge, quantitative reasoning, and strategy games, this trust-based filtering produces student models that perform on par with, and occasionally exceed, those trained on ground-truth data, thereby realizing near-lossless weak-to-strong generalization. Furthermore, trust functions facilitate an iterative weak-to-strong progression, where a trained student is recycled as the teacher for the next cycle, compounding performance gains. We identify several underlying mechanisms that explain the success of trust functions.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





