arXiv

DiffCrossGait: Trajectory-Level Alignment for 2D-3D Cross-Modal Gait Recognition via Latent Diffusion

June 2, 2026 · Zhiyang Lu, Ming Cheng · Original Source

Title: DiffCrossGait: Leveraging Trajectory-Level Alignment in Latent Diffusion for 2D-3D Cross-Modal Gait Recognition

Abstract

The task of cross-modal 2D-3D gait recognition is significantly hindered by the fundamental domain gaps that exist between 2D silhouette data and 3D LiDAR range-view representations. Unlike previous approaches that focus solely on aligning final embeddings, we introduce DiffCrossGait. This method redefines cross-modal matching as a process of trajectory-level alignment within an identity-relevant latent diffusion space, moving away from the assumption that 2D and 3D observations are fully equivalent.

By subjecting both modalities to shared Gaussian noise within a latent space, our approach facilitates continuous alignment throughout the generative evolution. We present a Tri-Phase Alignment Strategy that utilizes varying levels of noise intensity to enforce three key constraints: identity anchoring, consistency in dynamics, and the structural recoverability of cross-modal features. These constraints ensure that both modalities adhere to similar denoising dynamics and bottleneck structures, thereby fostering the extraction of modality-invariant gait features.

A critical advantage of our framework is the decoupling of generative alignment from the discriminative backbone. The diffusion mechanism functions exclusively as a training objective, which preserves high inference efficiency by removing the computational burden associated with iterative denoising during deployment. Comprehensive evaluations on the SUSTech1K and FreeGait benchmarks confirm that DiffCrossGait delivers state-of-the-art performance.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC