arXiv

TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards

Title: TextAlign: Aligning Text Rendering Preferences Through Hierarchical Rewards

Abstract: Large text-to-image generative models continue to struggle with accurate text rendering, a challenge that demands strict adherence to semantic instructions alongside precise, fine-grained control over glyph structures. While previous approaches have attempted to address this by integrating specific architectural modules or modifying encoders—often complicating their deployment across foundation models—we reframe text rendering as a post-training preference alignment issue. In this work, we introduce TextAlign, a non-invasive framework that enhances performance without altering the underlying generator architecture. Central to our method is a hierarchical reward system built on a vision-language model (VLM). This system breaks down rendering errors into global, word, and glyph-level components, transforming binary assessments of defects into a scalar preference signal. This signal is compatible with both Group Relative Policy Optimization (GRPO) and Direct Preference Optimization (DPO). Our evaluations on FLUX.1-dev and Z-Image-Turbo demonstrate consistent improvements in OCR-based text accuracy while maintaining high general generation quality. When benchmarked against robust foundation models and specialized text-rendering baselines such as SD3.5, Qwen-Image, AnyText, and TextDiffuser, our findings suggest that thoughtful reward design provides a scalable solution for enhancing text rendering, offering a viable alternative to extensive model redesign.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...