ST-ColoNet: Spatio-Temporal Colon Segment Recognition via Hybrid Attention and Edge-Guided Feature Learning
Title: ST-ColoNet: Spatio-Temporal Colon Segment Recognition via Hybrid Attention and Edge-Guided Feature Learning
Abstract:
Accurate colon-segment recognition within colonoscopy videos is essential for numerous downstream applications. However, current automated approaches primarily rely on static images, neglecting the valuable temporal data inherent in video streams, which results in suboptimal performance. Compounding this issue is the notable shortage of publicly available video-based datasets tailored for this specific task. To address these challenges, we have curated and made publicly available a new labeled dataset dedicated to colon-segment recognition. Furthermore, we introduce ST-ColoNet, a novel two-stage deep learning framework designed to identify colon segments in video footage. This architecture incorporates a Colorlaus module, which leverages metric learning to enhance spatial feature extraction through edge mediation, and a Full-Temp module. The latter integrates three distinct self-attention patterns to more effectively approximate full self-attention across extended colonoscopy sequences, thereby optimizing the aggregation of temporal features. Extensive ablation studies demonstrate that our framework achieves state-of-the-art results in colon-segment recognition, attaining an accuracy of 81.0% and an F1-score of 70.7%. These metrics represent a significant advancement over existing leading methods.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




