SRENet: Spectral Re-Entry Network for Point Cloud Action Recognition
Title: SRENet: Spectral Re-Entry Network for Point Cloud Action Recognition
Abstract:
For 3D perception systems utilized in domains like human-computer interaction and autonomous driving, the ability to identify human actions from point cloud sequences is indispensable. Nevertheless, the inherent irregularity of point clouds and their temporal inconsistencies create significant hurdles for spatio-temporal representation learning, particularly when attempting to grasp both broad motion contexts and subtle temporal nuances. To address these issues, we introduce SRENet, a framework that leverages a spectral perspective to explicitly capture both the global context and fine-grained temporal dynamics of motion.
SRENet incorporates a Spectral Decomposition Block (SDeBlock) which utilizes wavelet-based analysis across spatial and temporal dimensions. This process separates features into low- and high-frequency components, applying attention mechanisms tailored to specific frequencies. Additionally, to restore residual dynamics and correct temporal frequency structures that may become distorted during semantic fusion, we employ a Spectral Re-entry Block (SReBlock) that conducts a secondary temporal decomposition.
We also developed a spectral-aware learning strategy designed to boost discriminability within both frequency subspaces. This approach combines a curriculum schedule—progressively shifting attention from low-frequency to high-frequency spaces to align with coarse-to-detailed motion patterns—with contrastive loss. Comprehensive evaluations on the MSR-Action3D, NTU-RGBD, and NTU-RGBD120 datasets reveal that SRENet delivers state-of-the-art results, thereby confirming the efficacy of frequency-based modeling in understanding actions through point clouds.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





