arXiv

PointAction: 3D Points as Universal Action Representations for Robot Control

Title: PointAction: 3D Points as Universal Action Representations for Robot Control

Abstract:

Video-Action Models (VAMs) offer a promising trajectory toward generalizable robot manipulation by capitalizing on the extensive visual dynamics learned by pre-trained video diffusion models. However, relying solely on RGB video rollouts presents significant challenges, as these outputs fail to explicitly define metric 3D motion, contact geometry, and fine-grained spatial constraints, thereby creating ambiguity in action grounding. Furthermore, the process of scaling action supervision across various tasks and robot embodiments is prohibitively expensive.

To address these issues, we introduce PointAction, a framework that connects video predictions to robot actions via explicit point-based 4D modeling. By fine-tuning a foundation video generation model, PointAction simultaneously forecasts future RGB frames and dynamic 3D pointmaps. This approach generates temporally consistent 3D motion for the geometry relevant to the task. These dynamic points function as a structured, embodiment-agnostic interface for actions, which is then mapped to executable robot commands by a diffusion-based action decoder.

By employing metric 3D point dynamics as the bridge between video prediction and control, PointAction mitigates the ambiguity inherent in RGB-only action grounding. This method facilitates transfer learning across different tasks and embodiments while requiring minimal action supervision. Our experiments demonstrate that PointAction sets a new state-of-the-art in 4D generation quality for robot scenes, surpasses current baselines in simulation environments, and successfully generalizes to two real-world robot arms that were not part of the pretraining data.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

The Do’s and Don’ts of Buying Used Tech Gadgets
New York Times

The Do’s and Don’ts of Buying Used Tech Gadgets

Refurbished tech offers a cost-effective alternative amid component shortages and inflated prices. This guide outlines e...

Who is Elon Musk and what is his net worth?
BBC News

Who is Elon Musk and what is his net worth?

Elon Musk, CEO of Tesla and SpaceX, became the first person to surpass a $500 billion net worth in October 2025. His wea...

AI Boom Propels China Optical Maker to Top Weighting on CSI 300
Bloomberg

AI Boom Propels China Optical Maker to Top Weighting on CSI 300

Driven by surging AI demand, a Chinese optical maker has reached the highest weighting in the CSI 300 index.

AI Bubble 'Something to Look At,' BNP's Huynh Says (Video)
Bloomberg

AI Bubble 'Something to Look At,' BNP's Huynh Says (Video)

BNP Paribas’ Huynh describes the AI bubble as “something to look at,” signaling cautious interest in the sector’s potent...

SoftBank’s PayPay to Buy T&D’s Life Insurer for $840 Million
Bloomberg

SoftBank’s PayPay to Buy T&D’s Life Insurer for $840 Million

PayPay is acquiring T&D Holdings’ life insurer for $840 million, shortly after its historic $879.8 million Nasdaq IPO.

Goldman Sachs CEO David Solomon on Running a Bank in the Age of AI | Odd Lots
Bloomberg

Goldman Sachs CEO David Solomon on Running a Bank in the Age of AI | Odd Lots

Goldman Sachs CEO David Solomon discusses integrating AI into banking operations. He explores how artificial intelligenc...