Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?
Title: Leveraging Grammar: Do Synthetic Linguistic Reasoning Traces Boost Machine Translation in Low-Resource Settings?
Abstract:
Large language models (LLMs) present a viable strategy for machine translation (MT) in extremely low-resource languages by integrating linguistic resources via in-context learning. Nevertheless, these models frequently encounter difficulties in effectively utilizing grammatical data during the translation process. Drawing inspiration from recent advancements in chain-of-thought reasoning, this study explores whether low-resource MT can be enhanced by employing structured intermediate steps that involve linguistic analysis and grammatical reasoning.
We introduce a pipeline designed to automatically generate step-by-step linguistic reasoning traces derived from Universal Dependencies treebanks, dictionaries, and grammar-rule banks. To assess the efficacy of these traces, we conducted evaluations across three distinct settings: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement fine-tuning (RFT), using Xibe and Chintang as case studies.
Our findings indicate that linguistic reasoning traces serve as the most effective tool when applied as guidance during inference. Specifically, within the ICL framework, reliable, sentence-specific traces significantly boost translation performance across a wide range of models, languages, and evaluation metrics. Conversely, utilizing these traces as training data results in more modest and inconsistent improvements. This discrepancy arises because, while models successfully learn the format of the traces, they frequently produce inaccurate content. These results imply that while LLMs can effectively harness grammatical information for low-resource MT given access to reliable linguistic analyses, the ability to generate such analyses autonomously remains a significant challenge.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





