arXiv

MT-OSC: Path for LLMs that Get Lost in Multi-Turn Conversation

June 3, 2026 · Jyotika Singh, Fang Tu, Miguel Ballesteros, Weiyi Sun, Sandip Ghoshal, Michelle Yuan, Yassine Benajiba, Sujith Ravi, Dan Roth · Original Source

Title: MT-OSC: A Solution for LLMs Losing Their Way in Extended Dialogues

Abstract:

Despite multi-turn interactions being the standard for chat interfaces, large language models (LLMs) frequently experience a notable decline in performance when user instructions and contextual information are spread across several conversational turns. The conventional method of appending complete chat histories to prompts quickly fills context windows, resulting in higher computational expenses, increased latency, and diminishing returns as conversations progress. To address this, we present MT-OSC, a One-off Sequential Condensation framework that automatically and efficiently compresses chat history in the background, ensuring an uninterrupted user experience. Utilizing a Condenser Agent equipped with a few-shot inference-based Condenser and a lightweight Decider, MT-OSC selectively preserves crucial information, achieving token count reductions of up to 72% within 10-turn dialogues. Tested on 13 state-of-the-art LLMs and various multi-turn benchmarks, MT-OSC consistently bridges the multi-turn performance gap. It maintains or enhances accuracy across datasets, demonstrating robustness against irrelevant turns and distractors. These findings position MT-OSC as a scalable approach for multi-turn conversations, allowing for richer contextual understanding within limited input spaces while lowering latency and operational costs without compromising performance.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC