Controllable Dynamic 3D Shape Generation via 3D Trajectories and Text
Title: Controllable Dynamic 3D Shape Generation via 3D Trajectories and Text
Abstract:
We present T2Mo, a feed-forward system designed for the controllable generation of dynamic 3D shapes, driven by both 3D trajectories and textual prompts. While relying solely on text to generate precise movements is often hindered by linguistic ambiguity, our approach integrates 3D trajectories as a mechanism for spatial control. These trajectories define the specific paths that designated points must follow. By synthesizing these two inputs, T2Mo ensures that the resulting object motions strictly adhere to the provided spatial constraints while simultaneously aligning with the broader semantic meaning of the text description.
To effectively manage trajectory inputs of varying densities and distributions—ranging from sparse to dense and unevenly spread—we introduce a shape-grounded trajectory embedding. This component translates an arbitrary set of input trajectories into a comprehensive set of tokens that are aware of the object’s geometry. We performed rigorous comparisons against baselines that rely exclusively on text, as well as cascaded methods that link trajectory-guided video generation with video-to-dynamic mesh conversion. Through quantitative metrics, qualitative assessments, and user studies, our results indicate that T2Mo generates motions with superior expressiveness and higher fidelity to the original prompts, all while maintaining robust motion quality.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




