From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction
Title: Moving Beyond Scale: Redefining Transformers for CTR Prediction Through Structured Expressivity
Abstract:
Although significant resources are dedicated to increasing the size of deep models for click-through rate (CTR) prediction, these efforts frequently yield rapidly diminishing performance gains. This trend stands in sharp opposition to the predictable scaling laws observed in large language models (LLMs). We pinpoint the underlying issue as a fundamental structural misalignment: while conventional Transformers are built upon assumptions of sequential compositionality, CTR datasets require combinatorial reasoning across heterogeneous fields.
To bridge this gap, we propose the Field-Aware Transformer (FAT). By redesigning the standard Transformer block to utilize field-centric parameters, FAT delivers structured expressivity. This architectural shift fundamentally alters the dependency of model complexity, moving it from the total vocabulary size $n$ (where $n \gg F$) to the number of fields $F$. Furthermore, to separate model capacity from field cardinality, FAT utilizes a Basis-Composed Hypernetwork. This component synthesizes field-specific parameters from shared bases, thereby significantly lowering parameter complexity.
We support these claims theoretically by establishing a formal scaling law rooted in Rademacher complexity. Empirically, FAT surpasses current state-of-the-art techniques, achieving an AUC improvement of up to +4.38%. In live production environments, it demonstrates a +2.33% increase in CTR and a +0.66% rise in RPM. Our findings confirm that scalable recommendation systems depend not merely on model size, but on structured expressivity—ensuring architectural coherence with data semantics.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





