arXiv

SharedRequest: Privacy-Preserving Model-Agnostic Inference for Large Language Models

June 4, 2026 · Peihua Mai, Xuanrong Gao, Youlong Ding, Xianglong Du, Wei Liu, Yan Pang · Original Source

Title: SharedRequest: Privacy-Preserving Model-Agnostic Inference for Large Language Models

Abstract

As public large language models (LLMs) like ChatGPT become ubiquitous, safeguarding the privacy of user prompts has emerged as a pressing concern. Current approaches to privacy-preserving inference typically force a trade-off between utility and efficiency, and they frequently demand model-specific adjustments that hinder broad compatibility. To address these limitations, we introduce SharedRequest, a framework that enables privacy-preserving LLM inference without being tied to a specific model architecture. Unlike previous methods that focus on individual prompts, SharedRequest shifts the privacy protection mechanism to the batch level. Its core strategy involves masking sensitive data by blending original prompts with noisy versions and clustering semantically similar instructions. This approach allows the system to distribute inference costs across a substantial number of queries, thereby minimizing any negative effect on the quality of LLM responses. Because it operates independently of LLM architecture, SharedRequest does not require access to model parameters or structural modifications. Our empirical evaluations show that SharedRequest delivers more than 20% greater utility than existing differential privacy baselines. Furthermore, its shared-prompt mechanism cuts query costs by as much as 5 times when compared to standard non-batched inference.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC