arXiv

Recursive Vision Transformer with Dynamic Depth and Width Adjustment for Resource-Efficient Image Semantic Communication

June 2, 2026 · Zhilong Zhang, Xinhui Zhang, Gongyu Jin, Sihua Wang, Danpu Liu, Changchuan Yin · Original Source

Title: Resource-Efficient Image Semantic Communication via Recursive Vision Transformer with Adaptive Depth and Width Scaling

Abstract:

Image semantic communication serves as a foundational element for next-generation wireless networks. However, the substantial memory requirements and high computational demands of current systems often hinder their deployment on devices with limited resources. To overcome these obstacles, this study introduces a vision transformer (ViT)-based framework for image semantic communication. The proposed architecture employs a recursive structure that iteratively enhances semantic features while simultaneously lowering the total parameter count. Furthermore, we develop three dynamic adjustment mechanisms to curtail computational complexity in an adaptive manner: dynamic width adjustment, dynamic depth adjustment, and joint width-depth optimization. Specifically, dynamic depth adjustment tailors the number of recursive modules based on both channel conditions and image content, whereas dynamic width adjustment focuses on retaining only the most critical neurons and attention heads. The joint optimization strategy offers additional flexibility in configuring computational resources. Our simulations demonstrate that the integration of the recursive ViT with these three dynamic strategies yields a 48.7% reduction in parameters. Moreover, the system delivers superior reconstruction quality compared to existing baseline methods, even when operating under similar computational constraints.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC