arXiv

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

June 2, 2026 · Mengmeng Ji, Ravi Shanker Raju, Jonathan Lingjie Li, Chen Wu · Original Source

Title: LongAttnComp: Enabling Long-Context Reasoning via Cross-Family Context Compression

Abstract

As practical applications increasingly demand the processing of inputs exceeding 100,000 tokens, the disparity between context length and inference efficiency has emerged as a significant bottleneck. Context compression presents a viable solution to lower prefilling costs while maintaining task accuracy. Nevertheless, current training-free, attention-based approaches exhibit notable deficiencies in rigorous long-context scenarios, particularly code reasoning. To address this, we introduce LongAttnComp, an extension of AttnComp adapted for long-context environments. This method incorporates a fine-tuned, lightweight cross-attention scoring layer alongside token-level chunking, a token-budget top-p algorithm, positional reordering, and a format-agnostic query parser.

We also propose a two-stage fine-tuning protocol for the compressor. The first stage establishes a general retrieval foundation using NIAH-style data, while the second stage expands this capability with multi-hop and reasoning datasets to enhance coverage across diverse long-context tasks. Experimental results on InfiniteBench Code-Debug show that LongAttnComp achieves full-context accuracy levels or better, significantly surpassing training-free baselines, and demonstrates successful transfer across four target models from three distinct families. Furthermore, evaluations on LongBench v2 indicate that the two-stage approach substantially reduces the performance gap observed in Stage 1 for multi-document reasoning, all while retaining strong performance on Code-Debug.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC