Riemannian Gradient Descent for Low-Rank Architectures
Title: Applying Riemannian Gradient Descent to Low-Rank Neural Architectures
Abstract: This study investigates the application of Riemannian optimization strategies to matrix parameters that utilize rank factorization, with a specific focus on modern deep learning contexts. The research delineates ten distinct configurations within the algorithm design space. These include two geometric frameworks for rank-$r$ matrices and three geometric frameworks for rank-$r$ partial isometries. Additionally, the work considers block-matrix adaptations of these five geometries, wherein factors are distributed across block-rows and block-columns. We demonstrate the utility of these approaches by applying them to the multihead attention components of small-scale language models. Following the optimization of learning rates, our experimental results do not provide definitive evidence that these methods surpass an AdamW baseline. The code for our implementations has been made publicly accessible online.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





