arXiv

TrAction: Action Recognition with Sparse Trajectories

Title: TrAction: Action Recognition with Sparse Trajectories

Original: arXiv:2606.03490v1 Announce Type: new Abstract: Modern action recognition models operate on memory- and compute-intensive dense RGB video volumes and frequently exploit appearance and background shortcuts, for example, predicting actions from objects or scenes instead of characteristic motion. We investigate an efficient alternative input modality that is largely free of such biases by construction: sparse point trajectories. To this end, we develop a simple transformer architecture for 2.5D trajectory-based recognition together with a masked-trajectory pretraining, which we show to substantially improve downstream action recognition accuracy. Despite using only a fraction of the dense RGB input, our method reaches 45% top-1 on Something-Something V2 and 54% on EPIC-Kitchens-100, and surpasses V-JEPA on time-reversal sensitivity. More importantly, we find trajectory features to be complementary to state-of-the-art appearance-based features. Fusing our pretrained model with DINOv2 and V-JEPA 2 improves top-1 accuracy on Something-Something V2 by 8.7 and 1.6 points, respectively. Code: https://github.com/ecker-lab/TrAction

Rewrite: Current action recognition systems typically rely on resource-heavy, dense RGB video data, often falling back on superficial cues like specific objects or settings rather than analyzing true motion dynamics. To address this, we propose sparse point trajectories as a computationally efficient input method that inherently avoids these biases. Our approach introduces a streamlined transformer design tailored for 2.5D trajectory analysis, enhanced by a masked-trajectory pretraining strategy that significantly boosts performance on downstream tasks. Even though this method utilizes only a small portion of the data compared to dense RGB inputs, it achieves a 45% top-1 score on the Something-Something V2 dataset and 54% on EPIC-Kitchens-100. Additionally, it outperforms V-JEPA in detecting time-reversed actions. Crucially, we demonstrate that trajectory-based features complement existing appearance-driven models. Integrating our pre-trained model with DINOv2 and V-JEPA 2 yields improvements of 8.7 and 1.6 percentage points, respectively, in top-1 accuracy on Something-Something V2. Code: https://github.com/ecker-lab/TrAction


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...