Global News Digest

arXiv

TECCI: Tricky Edits of Collected and Curated Images

Title: TECCI: Evaluating the Nuances of Collected and Curated Image Edits

Despite significant advancements in recent years, text-guided image editing technologies continue to face substantial hurdles. Current methods often falter in areas such as strict instruction adherence, preserving the original source image with minimal alterations, and maintaining high visual fidelity. These limitations are particularly pronounced when handling complex requests, including adjustments to position, motion, viewpoint, scale, or creative transformations.

To provide a rigorous framework for testing generative image editors, we introduce TECCI (Tricky Edits of Collected and Curated Images), a novel benchmark designed to expose these weaknesses. TECCI features a newly released dataset comprising images across seven distinct categories. These categories were carefully selected and curated to specifically target the known deficiencies of existing editing models. The dataset includes 7,550 pairs of images and corresponding edit instructions. The instructions were automatically generated by Gemini, with five distinct edit types applied to each source image. Additionally, we curated a subset of 530 images accompanied by challenging, manually crafted edit instructions.

We conducted human evaluations of five leading image editing models using the TECCI dataset. Human judges assessed the model outputs based on three key criteria: instruction following, the minimality of changes made to the source, and overall visual quality. To facilitate a larger-scale assessment, we developed an automated rater powered by Gemini, which demonstrated a 74.7% accuracy rate in aligning with human judgments.

Our analysis yielded several critical findings: 1. The benchmark proves highly demanding, as no model achieved an overall success rate exceeding 22%. 2. Among the tested models, Nano Banana Pro emerged as the top performer. 3. Models showed considerably stronger performance in following instructions compared to their ability to perform minimal edits or maintain visual quality. 4. Significant difficulties were observed when editing architectural structures and natural scenes, tasks that require a deep understanding of spatial layouts and fine visual details. 5. Reasoning-based and creative edits proved to be the most challenging, while edits involving color and appearance were the easiest to execute.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.