arXiv

Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data

Title: Improving Paraphrase Type Generation: Assessing DPO and RLHF via Human-Ranked Data

Abstract: Paraphrasing involves restating meaning to improve the performance of tasks such as machine translation, question-answering, and text simplification. By facilitating precise semantic analysis and bolstering language models, specific paraphrase types play a critical role. Nevertheless, current methods for generating these paraphrase types frequently fail to align with human preferences. This misalignment stems from an overreliance on automated metrics and a scarcity of human-annotated training data, which often obscures essential nuances regarding semantic fidelity and linguistic transformation. To bridge this gap, our research utilizes a dataset ranked by humans and incorporates Direct Preference Optimization (DPO) to ensure model outputs directly correspond to human judgment.

Our results indicate that training with DPO boosts the accuracy of paraphrase-type generation by 3 percentage points compared to a supervised baseline, while simultaneously increasing human preference ratings by 7 percentage points. Furthermore, we introduce a newly developed human-annotated dataset to support more rigorous evaluations in future studies. In terms of detection capabilities, our paraphrase-type detection model achieved F1 scores of 0.91 for addition and deletion, 0.78 for substitutions with the same polarity, and 0.70 for punctuation modifications.

These outcomes highlight that utilizing preference data alongside DPO training yields paraphrases that are both semantically accurate and more reliable. This approach enhances downstream applications, including more robust question-answering systems and improved summarization. By outperforming automated metrics, the Paraphrase Type Detection (PTD) model offers a more dependable framework for assessing paraphrase quality. Ultimately, this work advances research in paraphrase-type generation toward richer, user-aligned language production and establishes a stronger, human-centric foundation for future evaluations.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...