arXiv

DyaPlex: Full-Duplex Speech-Motion Model for Dyadic Interaction

Title: DyaPlex: A Full-Duplex Speech-Motion Model for Dyadic Interaction

Original: arXiv:2606.03874v1 Announce Type: new Abstract: We present DyaPlex, a streaming, full-duplex speech-and-motion model designed for dyadic interaction. To capture the continuous and reciprocal nature of human communication, this full-duplex capability empowers the agent to simultaneously perceive and generate both speech and physical motion in a streaming fashion. At its core, our method leverages the strong priors of a foundational full-duplex speech model and integrates a novel motion pathway, thereby achieving fully synchronized multi-modal interaction. Specifically, we design a dual-tower Transformer architecture that preserves the zero-shot conversational reasoning of a frozen base speech model while constructing a deeply coupled, streaming motion pathway. By introducing a unified dyadic token interleaving mechanism and guiding cross-attention via a time-aligned speech-motion RoPE, our model effectively aligns autoregressive motions with rich latent speech features. Trained on the 4,000-hour Seamless Interaction dataset, our model effectively captures cross-speaker dependencies and establishes new state-of-the-art performance across both monadic and dyadic human interaction benchmarks.

Rewritten:

We introduce DyaPlex, a novel streaming model capable of full-duplex speech and motion generation, specifically engineered for dyadic interactions. To mirror the fluid and reciprocal dynamics inherent in human dialogue, this full-duplex architecture enables agents to concurrently perceive and produce speech alongside physical gestures in real-time. The core of our approach builds upon the robust priors of an existing foundational full-duplex speech model, augmenting it with a newly developed motion pathway to facilitate completely synchronized multi-modal engagement.

Our solution employs a dual-tower Transformer design. This structure maintains the zero-shot conversational reasoning capabilities of a frozen base speech model while simultaneously establishing a tightly integrated, streaming motion trajectory. We achieve precise alignment between autoregressive movements and complex latent speech features through two key innovations: a unified mechanism for interleaving dyadic tokens and the use of time-aligned speech-motion RoPE to direct cross-attention.

Evaluated on the 4,000-hour Seamless Interaction dataset, the model successfully learns cross-speaker dependencies. Consequently, it sets new state-of-the-art records on benchmarks measuring both monadic and dyadic human interaction performance.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...

Google Ordered to Make Changes to AI Search Summaries by UK
Bloomberg

Google Ordered to Make Changes to AI Search Summaries by UK

The UK has ordered Google to modify its AI search summaries. This mandate aims to ensure greater accuracy and transparen...

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...