arXiv

IMPose: Interactive Multi-person Pose Estimation with Dynamic Correction Propagation

June 4, 2026 · Haoyang Ge, Jian Ma, Ziwen Wang, Qihe Wang, Jianqi Fan, Hongzhi Yu, Xingyu Chen, Kun Li · Original Source

Title: IMPose: Interactive Multi-person Pose Estimation with Dynamic Correction Propagation

Original: arXiv:2606.04480v1 Announce Type: new Abstract: High-quality dynamic human pose annotation equips AI with precise motion kinematics to enable human behavior mastery, yet remains labor-intensive and time-consuming. Current annotation tools either lack temporal correction propagation or fail in multi-person scenarios, necessitating excessive manual intervention. In this paper, we introduce IMPose, an interactive tool for multi-person dynamic pose annotation. It features a dual-level tracking mechanism that propagates one-frame multi-person pose corrections from annotators across entire videos. The keypoint-level ensures corrections temporal propagation via sequential modeling, while the instance-level employs keypoint-aware embedding with relative positional encoding to maintain multi-person cross-frame consistency. To further improve robustness, IMPose maintains historical pose and instance cues in a trajectory bank, which enhances long-range temporal association and stabilizes annotation in challenging cases such as occlusion and motion blur. By converting sparse human corrections into dense and coherent pose trajectories, our framework significantly reduces repeated manual refinement across frames. Extensive experiments show that IMPose consistently achieves a strong accuracy efficiency trade off under different interaction budgets, demonstrating particular advantages in low click annotation settings. IMPose achieves high precision annotation with high efficiency, requiring only 27 clicks per 1,050 frame video on 3DPW and 3 clicks per tracklet per 84-frame on PoseTrack21. We further expand PoseTrack21 with 188K pose instances (3.55M keypoints) at a minimal cost of 10 annotators in 10 hours. The annotation tool, codes, and extended dataset will be open-sourced.

Rewrite: High-fidelity dynamic human pose annotation provides AI systems with the precise motion kinematics needed to master human behavior; however, this process is notoriously time-consuming and labor-intensive. Existing annotation solutions often fall short, either by omitting temporal correction propagation or by struggling in multi-person environments, which leads to heavy reliance on manual effort. To address these limitations, we present IMPose, an interactive system designed for dynamic multi-person pose annotation. IMPose utilizes a two-tiered tracking mechanism that distributes corrections made by annotators in a single frame across the entire duration of a video. At the keypoint level, sequential modeling facilitates the temporal spread of these corrections, whereas the instance level leverages keypoint-aware embeddings combined with relative positional encoding to ensure consistency across frames for multiple individuals. To bolster robustness, the system stores historical pose and instance data in a trajectory bank, thereby improving long-range temporal linking and stabilizing annotations during difficult conditions like motion blur and occlusion. By transforming sparse user inputs into dense, coherent pose trajectories, our approach drastically cuts down on the need for repetitive manual adjustments. Comprehensive evaluations indicate that IMPose maintains an optimal balance between accuracy and efficiency across various interaction limits, with notable benefits in scenarios requiring minimal clicks. Specifically, the tool demonstrates high efficiency and precision, needing just 27 clicks for a 1,050-frame video on the 3DPW dataset and 3 clicks per tracklet for an 84-frame sequence on PoseTrack21. Additionally, we augmented the PoseTrack21 dataset with 188,000 pose instances (totaling 3.55 million keypoints) using only 10 annotators over a span of 10 hours. The code, annotation tool, and the expanded dataset will be made available to the public.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC