arXiv

Demo2Tutorial: From Human Experience to Multimodal Software Tutorials

Title: Demo2Tutorial: Converting Human Experience into Multimodal Software Tutorials

Abstract:

Digital environments harbor a largely untapped reservoir of authentic, unedited interactions, offering a wealth of procedural knowledge derived from human experience. We present Demo2Tutorial, a novel framework designed to convert this captured experience—gathered through screen recordings and interaction logs—into structured, multimodal software tutorials suitable for instructing both humans and AI agents. The process begins with the collection of human experience using a specialized recorder. Subsequently, a multimodal Action Parser deciphers the raw data to reconstruct the user’s perception, actions, and intent. Following this, a Step Planner organizes these elements into hierarchical task graphs that delineate specific goals and steps. Finally, a Tutorial Composer synthesizes the parsed experience into reusable, structured instructions combining images and text.

We assessed the quality of the generated tutorials using a new benchmark based on official software documentation. Our results indicate that this distilled representation offers dual benefits: it enhances human learning through the automatic creation of multimodal tutorials and boosts agent learning by refining downstream GUI-agent planning and generalization capabilities. Experimental findings reveal that Demo2Tutorial generates high-quality tutorials that exceed the standard of human-authored content and significantly outperform baseline methods. Furthermore, the framework facilitates faster task completion for humans and improves planning for GUI agents, demonstrating that structured tutorials extracted from human experience can serve as potent knowledge representations for advancing both human education and artificial intelligence capabilities. Code and data will be accessible at https://github.com/showlab/Demo2Tutorial.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...