Learning When to Translate for Multilingual Reasoning
Title: Mastering the Timing of Translation for Multilingual Reasoning
Original: arXiv:2606.02465v1 Announce Type: cross Abstract: Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, but still exhibit substantial multilingual reasoning gaps, largely due to language-understanding failures in non-English inputs. English translation can mitigate these failures by expressing non-English inputs in a form that RLMs can more reliably interpret, yet translating every input is unnecessary when the model can reason reliably from the original query. To address this challenge, we propose Luar, a Language Understanding Boundary-aware Reinforcement Learning framework that trains RLMs to selectively invoke translation when direct understanding is unreliable. Luar trains the model to choose between solving the original input directly and reasoning over its English translation, encouraging translation only when translator-augmented reasoning is expected to substantially outperform direct reasoning. Across multilingual reasoning benchmarks, Luar outperforms standard GRPO and other training-based baselines, with particularly large gains on low-resource languages. Further analysis shows that Luar avoids unnecessary translation in cases where direct reasoning is sufficient, while extending its translator-call behavior to unseen low-resource languages. Together, our work suggests a selective approach to multilingual reasoning: RLMs can learn to invoke translation only when their direct understanding is unreliable. The project will be made publicly available at https://github.com/deokhk/LUAR
Rewrite: Title: Determining the Optimal Moment to Translate for Multilingual Reasoning
Original: arXiv:2606.02465v1 Announce Type: cross Abstract: Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, but still exhibit substantial multilingual reasoning gaps, largely due to language-understanding failures in non-English inputs. English translation can mitigate these failures by expressing non-English inputs in a form that RLMs can more reliably interpret, yet translating every input is unnecessary when the model can reason reliably from the original query. To address this challenge, we propose Luar, a Language Understanding Boundary-aware Reinforcement Learning framework that trains RLMs to selectively invoke translation when direct understanding is unreliable. Luar trains the model to choose between solving the original input directly and reasoning over its English translation, encouraging translation only when translator-augmented reasoning is expected to substantially outperform direct reasoning. Across multilingual reasoning benchmarks, Luar outperforms standard GRPO and other training-based baselines, with particularly large gains on low-resource languages. Further analysis shows that Luar avoids unnecessary translation in cases where direct reasoning is sufficient, while extending its translator-call behavior to unseen low-resource languages. Together, our work suggests a selective approach to multilingual reasoning: RLMs can learn to invoke translation only when their direct understanding is unreliable. The project will be made publicly available at https://github.com/deokhk/LUAR
Rewrite: Abstract: While Reasoning Language Models (RLMs) demonstrate robust capabilities in handling complex reasoning tasks, they continue to show significant deficits in multilingual contexts. These gaps primarily stem from failures in comprehending non-English inputs. Although translating these inputs into English can help overcome interpretation issues—since RLMs process English more reliably—it is not always efficient to translate every query, especially when the model can successfully reason from the original text. To tackle this issue, we introduce Luar, a framework based on Language Understanding Boundary-aware Reinforcement Learning. Luar is designed to teach RLMs when to employ translation by selectively invoking it only when direct comprehension is uncertain. The model is trained to decide between addressing the input directly or reasoning through its English translation, promoting the use of translation solely when it is anticipated to yield significantly better results than direct reasoning. In evaluations across various multilingual reasoning benchmarks, Luar surpasses standard GRPO and other training-based baselines, achieving especially notable improvements with low-resource languages. Additional analysis reveals that Luar successfully refrains from translating when direct reasoning is adequate, while also generalizing its translation-calling strategy to previously unseen low-resource languages. Our findings advocate for a selective strategy in multilingual reasoning: RLMs can be trained to use translation exclusively when their initial understanding is flawed. The code for this project will be released publicly at https://github.com/deokhk/LUAR.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




