DiScoFormer: Plug-In Density and Score Estimation with Transformers
Title: DiScoFormer: A Plug-In Approach to Density and Score Estimation Using Transformers
Abstract:
The estimation of probability density functions and their corresponding scores from sample data is a fundamental challenge in kinetic theory, Bayesian inference, and generative modeling. Current methodologies are divided into two distinct categories: traditional kernel density estimators (KDE), which offer broad distributional generalization but are hindered by the curse of dimensionality, and contemporary neural score models, which deliver superior accuracy but necessitate retraining for each specific target distribution. To address these limitations, we present DiScoFormer (Density and Score Transformer), an equivariant Transformer architecture designed for a "train-once, infer-anywhere" paradigm. This model effectively maps independent and identically distributed (i.i.d.) samples to both density values and score vectors, enabling robust generalization across varying sample sizes and distributions.
From a theoretical standpoint, we demonstrate that self-attention mechanisms are capable of recovering normalized KDE, thereby positioning the Transformer as a functional extension of kernel-based methods. Empirically, we observe that individual attention heads autonomously acquire multi-scale, kernel-like properties. In practical applications, DiScoFormer outperforms KDE in density estimation by converging more rapidly and achieving greater precision. Furthermore, it serves as a high-fidelity plug-in score oracle, facilitating score-debiased KDE, the calculation of Fisher information, and the solution of Fokker-Planck-type partial differential equations.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






