arXiv

TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design

June 3, 2026 · Haonan Zhu, Elad Hirsch, Alexandria Minetti, Allison Nulty, Purvanshi Mehta · Original Source

Title: TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design

Abstract: Although text-to-image models have achieved production-scale capabilities in generating graphic design, their training supervision largely relies on photo-style preference datasets that offer only a single, holistic verdict per comparison. This approach is inadequate because professional designers assess designs across multiple distinct dimensions—such as typography, layout, and color harmony—rather than a singular overall rating. To address this, we introduce \emph{TASTE} \textit{(Typography, Aesthetics, Spatial, Tone, Etc.)}, a novel multi-dimensional preference dataset. This resource comprises rankings provided by two separate groups of five professional designers, who evaluated outputs from four contemporary text-to-image models across nine specific criteria, including flags for per-image hallucinations.

We accompany the dataset with two primary contributions. First, we implement a criterion-agnostic signal-validation framework utilizing Kendall’s $\tau$, majority-vote probability, and Condorcet cycles, benchmarked against exact iid-uniform nulls. Our analysis demonstrates that while designer agreement is significant, it remains moderate; notably, every criterion within TASTE rejects the random-rater null hypothesis. Second, we benchmark preference models using TASTE and observe that standard VLM judges and dedicated T2I scorers do not achieve majority agreement with the designer panel. However, training a small MLP head directly on TASTE data significantly reduces this discrepancy, bringing performance close to the single-rater ceiling and establishing a new baseline for future preference models trained on TASTE.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC