arXiv

Modeling Robotics Dataset Construction as an Artifact-Based Build Process

Title: Treating Robotics Dataset Creation as an Artifact-Centric Build Pipeline

Abstract:

While robotic platforms produce vast amounts of multimodal sensor information, the transformation of ROS bag files into machine learning-ready datasets typically relies on fragmented, sequential scripts. This traditional method introduces significant engineering burdens and results in sluggish iteration cycles. To address these inefficiencies, we propose modeling dataset construction as an artifact-based build process governed by a dependency graph. We have realized this concept through Bagzel, an open-source extension for Bazel that enables reproducible and incremental dataset generation, with support for exporting data in the nuScenes format.

In our evaluation, we benchmarked Bagzel and its variant, Bagzel-xattr (which utilizes server-side digest management), against a standard sequential rosbag2nuscenes baseline. The results demonstrate that Bagzel lowers runtime across all tested execution modes, delivering the most substantial improvements in iterative development scenarios. Specifically, on a 20.4 GB dataset, Bagzel achieved speedups of up to 386.26x during warm builds and 7.21x during incremental builds. Furthermore, as dataset sizes ranged from 5.1 to 20.4 GB, the Bagzel variants exhibited superior scaling characteristics compared to the baseline, particularly in warm and incremental contexts. The Bagzel-xattr implementation offered further optimizations, yielding an average runtime reduction of 5.9% relative to standard Bagzel in our input granularity analysis. Ultimately, applying an artifact-based build framework to robotics dataset construction significantly decreases the latency of dataset updates while preserving a deterministic design that ensures reproducibility. Bagzel is accessible at https://github.com/UniBwTAS/bagzel.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...