arXiv

IMPose: Interactive Multi-person Pose Estimation with Dynamic Correction Propagation

Title: IMPose: Interactive Multi-person Pose Estimation with Dynamic Correction Propagation

Original: arXiv:2606.04480v1 Announce Type: new Abstract: High-quality dynamic human pose annotation equips AI with precise motion kinematics to enable human behavior mastery, yet remains labor-intensive and time-consuming. Current annotation tools either lack temporal correction propagation or fail in multi-person scenarios, necessitating excessive manual intervention. In this paper, we introduce IMPose, an interactive tool for multi-person dynamic pose annotation. It features a dual-level tracking mechanism that propagates one-frame multi-person pose corrections from annotators across entire videos. The keypoint-level ensures corrections temporal propagation via sequential modeling, while the instance-level employs keypoint-aware embedding with relative positional encoding to maintain multi-person cross-frame consistency. To further improve robustness, IMPose maintains historical pose and instance cues in a trajectory bank, which enhances long-range temporal association and stabilizes annotation in challenging cases such as occlusion and motion blur. By converting sparse human corrections into dense and coherent pose trajectories, our framework significantly reduces repeated manual refinement across frames. Extensive experiments show that IMPose consistently achieves a strong accuracy efficiency trade off under different interaction budgets, demonstrating particular advantages in low click annotation settings. IMPose achieves high precision annotation with high efficiency, requiring only 27 clicks per 1,050 frame video on 3DPW and 3 clicks per tracklet per 84-frame on PoseTrack21. We further expand PoseTrack21 with 188K pose instances (3.55M keypoints) at a minimal cost of 10 annotators in 10 hours. The annotation tool, codes, and extended dataset will be open-sourced.

Rewrite: High-fidelity dynamic human pose annotation provides AI systems with the precise motion kinematics needed to master human behavior; however, this process is notoriously time-consuming and labor-intensive. Existing annotation solutions often fall short, either by omitting temporal correction propagation or by struggling in multi-person environments, which leads to heavy reliance on manual effort. To address these limitations, we present IMPose, an interactive system designed for dynamic multi-person pose annotation. IMPose utilizes a two-tiered tracking mechanism that distributes corrections made by annotators in a single frame across the entire duration of a video. At the keypoint level, sequential modeling facilitates the temporal spread of these corrections, whereas the instance level leverages keypoint-aware embeddings combined with relative positional encoding to ensure consistency across frames for multiple individuals. To bolster robustness, the system stores historical pose and instance data in a trajectory bank, thereby improving long-range temporal linking and stabilizing annotations during difficult conditions like motion blur and occlusion. By transforming sparse user inputs into dense, coherent pose trajectories, our approach drastically cuts down on the need for repetitive manual adjustments. Comprehensive evaluations indicate that IMPose maintains an optimal balance between accuracy and efficiency across various interaction limits, with notable benefits in scenarios requiring minimal clicks. Specifically, the tool demonstrates high efficiency and precision, needing just 27 clicks for a 1,050-frame video on the 3DPW dataset and 3 clicks per tracklet for an 84-frame sequence on PoseTrack21. Additionally, we augmented the PoseTrack21 dataset with 188,000 pose instances (totaling 3.55 million keypoints) using only 10 annotators over a span of 10 hours. The code, annotation tool, and the expanded dataset will be made available to the public.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.