Token Rankings are Unforgeable Language Model Signatures
Title: Token Rankings Function as Unforgeable Signatures for Language Models
Abstract
It is established that the parameters of a language model impose distinct geometric constraints on their logit outputs, creating a unique signature for each model. While this feature aids in model identification, it also poses a security risk: if an API exposes raw logits, it inadvertently leaks the model’s final layer parameters. In this study, we examine more restrictive APIs that only disclose token rankings—the ordering of tokens by probability without revealing the actual probability values. We demonstrate that these rankings also serve as a unique model signature; specifically, for a sufficiently large $k$, every model possesses a distinct set of feasible top-$k$ rankings.
Crucially, the ranking-based signature represents the first known signature that is unforgeable in a polynomial sense, as identifying another model with an identical set of feasible rankings is an NP-hard problem. Regarding security implications, we find that token rankings alone are sufficient to approximately reconstruct the model’s final layer, a vulnerability similar to that associated with logits. However, the resulting approximation is too imprecise to forge the signature effectively. This attack can be mitigated by limiting the API to return only the top-$k$ tokens, provided $k$ is sufficiently small. Because the value of $k$ required to display the model’s signature is generally lower than the $k$ needed to prevent parameter theft, it is feasible for an API to broadcast an unforgeable signature without compromising the model’s parameters.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




