Soft-NBCE: Entropy-Weighted Chunk Fusion for Long-Context
Title: Soft-NBCE: Entropy-Weighted Chunk Fusion for Long-Context
Abstract:
The quadratic computational complexity inherent in self-attention mechanisms continues to hinder the ability of Large Language Models (LLMs) to process ultra-long sequences efficiently. The Naive Bayes Cognitive Engine (NBCE) addresses this by parallelizing long-context inference through document chunking, selecting the chunk with the lowest entropy for each decoding step. However, this hard-selection approach leads to semantic fragmentation during cross-chunk reasoning, as sudden shifts in routing between adjacent tokens destabilize the model’s contextual grounding.
To resolve this, we introduce Soft-NBCE, a lightweight extension that substitutes discrete chunk selection with soft, entropy-weighted chunk fusion. By applying a temperature-scaled Softmax function to predictive entropies, the model assigns continuous weights to all chunks, facilitating log-space aggregation across chunk-conditioned distributions. Furthermore, to mitigate the conditional independence assumption imposed by chunking, we propose Consistency Distillation. This LoRA-based self-distillation technique aligns the chunked logit distribution with a full-context teacher model using KL-divergence.
Evaluations on the LongBench multi-hop benchmarks demonstrate that Soft-NBCE, enhanced with Consistency Distillation, consistently outperforms NBCE-style baselines. Specifically, it achieves a MuSiQue F1 score of 0.310 compared to 0.275 for Vanilla NBCE, and a HotpotQA F1 score of 0.479 versus 0.427. These gains are realized while maintaining high retrieval accuracy (NIAH-32K: 0.909) and keeping peak memory usage at O(L^2/n).
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




