TrustLDM: Benchmarking Trustworthiness in Language Diffusion Models
Title: TrustLDM: Assessing Reliability in Language Diffusion Models
Abstract:
Language Diffusion Models (LDMs) are rapidly emerging as a significant challenge to the supremacy of auto-regressive models in the field of language processing. While their unique any-order decoding mechanisms facilitate high-speed generation, they also introduce novel trustworthiness concerns. To investigate the potential risks inherent in LDM workflows, we present TrustLDM, a specialized benchmark designed to assess the safety, privacy, and fairness of various LDM architectures under diverse static post-context scenarios. Our empirical analysis reveals that while LDMs maintain robust trustworthiness when relying solely on user prompts, their alignment capabilities deteriorate significantly when malicious post contexts are appended to masked responses. Additionally, we find that context length does not strictly correlate with the magnitude of these effects, and that both the order of decoding and the length of the generated text influence evaluation results. To address these issues, we introduce TrustLDM-Auto, an automated evaluation framework that utilizes the decoding flexibility of LDMs to systematically pinpoint vulnerable configurations. This approach exposes considerable trustworthiness deficits across all tested models and dimensions. Our findings aim to support the development of more reliable LDMs. The source code is accessible at https://github.com/PKU-ML/TrustLDM.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




