Frequency-Enhanced Diffusion Models: Curriculum-Guided Semantic Alignment for Zero-Shot Skeleton Action Recognition
Title: Frequency-Enhanced Diffusion Models: Curriculum-Guided Semantic Alignment for Zero-Shot Skeleton Action Recognition
Abstract: Human action recognition constitutes a cornerstone of computer vision, underpinning critical applications such as surveillance systems and human-robot interaction. While supervised skeleton-based approaches have proven effective, their dependence on comprehensive annotation datasets hinders their ability to generalize toward unseen actions. Zero-Shot Skeleton Action Recognition (ZSAR) offers a compelling solution to this limitation; however, it is often constrained by the spectral bias inherent in diffusion models, which tends to oversmooth high-frequency motion dynamics. To mitigate these issues, we introduce Frequency-Aware Diffusion for Skeleton-Text Matching (FDSM). This framework incorporates a Semantic-Guided Spectral Residual Module, a Timestep-Adaptive Spectral Loss, and Curriculum-based Semantic Abstraction. By effectively restoring fine-grained motion details, our method achieves state-of-the-art results on the NTU RGB+D, PKU-MMD, and Kinetics-skeleton benchmarks. The source code is publicly accessible at https://github.com/yuzhi535/FDSM, and further details can be found on the project homepage: https://yuzhi535.github.io/FDSM.github.io/
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





