arXiv

What Cosine Similarity of Label Representations Can and Cannot Tell us

Title: The Limitations and Specific Utility of Cosine Similarity in Label Representations

Abstract

While cosine similarity is a standard metric for evaluating the likeness of vector representations within neural networks, it does not inherently correlate with model probabilities. This study demonstrates that for softmax classifiers—encompassing both autoregressive language models and image classifiers—the cosine similarity between label representations, referred to as "unembeddings" in this context, offers no insight into the probabilities the model assigns. We provide a proof showing that for any two given unembeddings, one can construct an alternative model that yields identical probability outputs for all inputs, yet exhibits a cosine similarity of either 1 or -1 between those representations.

Conversely, we find that for sigmoid classifiers, which allow for multiple labels per input, the complete set of pairwise cosine similarities between unembeddings fully determines the possible label combinations. In contrast, for softmax classifiers that generate a ranked list of labels from most to least probable, understanding the potential predictions requires knowledge of the pairwise cosine similarities among all differences of unembeddings. Ultimately, we argue that interpreting the cosine similarity of unembeddings in isolation, without considering the specific classifier that generated them, is misleading.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...