arXiv

G^2C-MT: Graph-Guided Context Selection for Document-Level Machine Translation

June 3, 2026 · Baijun Ji, Zixuan Zhou, Xiangyu Duan, Yu Liu, Longbo Sun, Rupu Wei, Bohong Zhao · Original Source

Title: G^2C-MT: Graph-Guided Context Selection for Document-Level Machine Translation

Abstract

Successfully executing document-level machine translation (DocMT) demands the ability to capture long-range discourse dependencies. While recent studies have investigated retrieval-based methods and discourse-aware context selection, these techniques frequently lack an explicit mechanism for modeling the structured discourse links connecting distant paragraphs within a document. To address this limitation, we introduce G^2C-MT (Graph-Guided Context for Machine Translation). This approach reframes DocMT context selection as a structured path discovery task on a lightweight discourse graph, moving away from retrieving unstructured context sets or depending on costly LLM-based discourse modeling.

Specifically, our method represents each paragraph as a node and models the relationships between node pairs by evaluating semantic similarity, adjacency, and keyword overlap. We then employ a depth-biased random walk over the graph to sample a backward context path for every target paragraph. These sampled paths serve as prompts for large language models (LLMs) during the translation process. The framework inherently accommodates multi-path context sampling, enhancing robustness by aggregating varied translation candidates to handle discourse-ambiguous inputs. Evaluations across diverse domains demonstrate that G^2C-MT surpasses strong baselines when paired with various LLMs, such as DeepSeek-V3, Gemini-2.5-Flash-lite, and the Qwen-2.5/3 series.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC