"I've Seen How This Goes": Characterizing Diversity via Progressive Conditional Surprise
Title: "I've Seen How This Goes": Characterizing Diversity via Progressive Conditional Surprise
Abstract: Assessing the variety of creative outputs is a critical component in evaluating post-training mode collapse, comparing different decoding strategies, and quantifying creative behavior in both artificial and human writing. We introduce a novel method for measuring diversity that leverages in-context learning. The primary instance we evaluate is the "Decan" metric, defined as $D_{Ca_n} = C \times a_n$. This metric provides a per-byte score derived from the per-token log-probabilities of a base model $\theta$ during a single forward pass for each permutation. Notably, this process requires no embedding models, reference corpora, or human annotations. Grounded in information theory, this approach utilizes the in-context learning capabilities of language models to identify a broad spectrum of similarities across any number of inputs, thereby eliminating the necessity for training specialized models. The same pipeline is capable of scoring both AI-generated samples and human-written response sets, treating diversity as an attribute of the combination of (responses, prompt, scoring model). When tested on Tevet and Berant's human-grounded McDiv benchmark, $D_{Ca_n}$ achieved an OCA of 0.846 on the McDiv prompt_gen set, where it performed optimally, trailing only the strongest neural baseline reported by Tevet and Berant (SentBERT, 0.897). Furthermore, in the context of the OLMo-2-7B post-training pipeline, $D_{Ca_n}$ exhibited a monotonic decline across the base, SFT, DPO, and RLVR stages, effectively identifying the specific type of diversity loss that is pertinent to creative-writing applications.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




