Scalable Derivative Gaussian Processes via Exact Gradient Reduction
Title: Achieving Scalable Derivative Gaussian Processes Through Exact Gradient Reduction
Abstract: The incorporation of gradient observations can significantly enhance the performance of Gaussian process (GP) surrogates, a benefit that is especially pronounced in high-dimensional environments where obtaining function evaluations is computationally costly. Nevertheless, performing exact inference with $n$ function values and their corresponding full gradients in $d$ dimensions presents a severe computational challenge. The joint state size leads to a cubic scaling complexity, resulting in an intractable $\mathcal{O}(n^3 d^3)$ bottleneck.
To address this, we propose TERA, a novel, highly scalable derivative GP approach centered on target-specific exact gradient reduction. We demonstrate that for stationary kernels, gradient components that are orthogonal to the vectors linking the target and conditioning points are conditionally independent of the target function value. As a result, the exact conditional density is completely determined by no more than $m^2$ directional derivatives, provided a conditioning set of size $m$ is established. By integrating these reduced, dimension-independent conditionals as local factors within a Vecchia approximation, TERA successfully separates $n$ and $d$ from the computationally intensive dense matrix inversion.
This method lowers the cost of evaluating each target to $\mathcal{O}(dm^2 + m^6)$ time and $\mathcal{O}(dm^2 + m^4)$ memory, without altering the mathematical foundation of the underlying derivative GP model. Experimental results show that TERA delivers state-of-the-art predictive accuracy while running orders of magnitude faster than conventional derivative GPs. Notably, both peak GPU memory usage and computation time remain largely constant with respect to $d$, thereby facilitating highly scalable inference even in high-dimensional spaces.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



