Backtranslation Augmented Direct Preference Optimization for Neural Machine Translation
Title: Enhancing Neural Machine Translation via Backtranslation-Augmented Direct Preference Optimization
Abstract:
Current neural machine translation (NMT) architectures rely predominantly on supervised training with parallel corpora. Although significant advancements have been made, these models continue to suffer from recurring translation inaccuracies. This study posits that a reinforcement learning (RL) based post-training phase can successfully correct these deficiencies. We present a new framework that necessitates merely a general text corpus and an expert translator—whether human or artificial intelligence—to supply iterative feedback. Our experimental analysis concentrates on English-to-German translation, selected as a benchmark for high-resource language pairs. Central to our approach is the application of Direct Preference Optimization (DPO) for this RL-driven post-training process. When applied to the gemma3-1b model, this DPO-centric strategy resulted in substantial gains in translation accuracy, raising the COMET score for the English-to-German task from 0.703 to 0.747. These findings indicate that DPO provides a robust and efficient mechanism for improving pre-trained NMT systems through preference-oriented post-training.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





