SentGuard: Sentence-Level Streaming Guardrails for Large Language Models
Title: SentGuard: Implementing Sentence-Level Streaming Guardrails for Large Language Models
Abstract
As large language models (LLMs) increasingly generate lengthy, reasoning-heavy responses in real-time streaming modes, the timing of moderation has become just as crucial as the decision to moderate itself. Current guardrail mechanisms are trapped in two suboptimal extremes: response-level approaches wait until the entire output is complete before intervening, while token-level methods act on fragmented semantics, leading to inconsistent judgments and an excessive number of guard triggers. To overcome these limitations, we introduce SentGuard, a sentence-level streaming guardrail that runs concurrently with text generation. SentGuard utilizes a lightweight waiting buffer to aggregate streamed tokens into sentence-sized chunks, releasing only those that have been verified to the user. This design introduces a minimal latency offset, allowing SentGuard to evaluate the current text prefix while the primary LLM continues decoding subsequent content. To facilitate this approach, we developed StreamSafe, a benchmark featuring structured, per-sentence annotations across eight distinct harm categories. This resource captures how safety risks evolve throughout both reasoning and response phases. Additionally, we trained SentGuard using a coarse-to-fine objective designed to identify unsafe intentions immediately upon their emergence at sentence boundaries. Evaluations across five safety benchmarks demonstrate that SentGuard surpasses existing baselines, successfully identifying 90.5% of unsafe instances within just two sentences while sustaining a low streaming false-positive rate of 7.41%.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





