arXiv

Cartridges at Scale: Training Modular KV Caches over Large Document Collections

Title: Scaling Cartridges: Training Modular KV Caches Across Massive Document Collections

Abstract:

While Large Language Models possess the ability to reason over extensive contexts, the prefilling of millions of tokens is often inefficient because significant portions of the data remain unchanged across different queries. Cartridges offer a solution by distilling document collections into reusable key-value (KV) caches, thereby removing the need for prefilling without sacrificing accuracy. However, this method faces a significant bottleneck: cartridges are currently monolithic and lack composability. Encoding an entire collection into a single KV block fails to scale, and simply combining cartridges trained in isolation results in performance degrading to near-random chance.

To address these challenges, we present Cartridges at Scale (CAS), a training framework designed for scalable multi-cartridge learning. CAS incorporates dynamic distractor mixing and a memory-efficient budget manager capable of rotating hundreds of per-document cartridges between GPU memory and persistent storage. This methodology supports collections surpassing one million tokens, delivering improvements of 10–31 points over monolithic cartridges while maintaining similar token budgets. Even under high compression, the accuracy of the oracle cartridge remains within 2–6 points of full in-context learning. Furthermore, when combined with retrieval mechanisms for cartridge selection, CAS achieves accuracy levels that match or surpass conventional RAG systems, all while reducing prompt token consumption by a factor of 3–4.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

Waymo’s spent robotaxi batteries will be used as grid storage

Waymo partners with B2U to repurpose retired robotaxi batteries for grid storage in California and Texas, aligning with ...

Updates From Bloomberg Tech Conference 2026
Bloomberg

Updates From Bloomberg Tech Conference 2026

Bloomberg Tech Conference 2026 insights are presented alongside a photo of San Francisco’s downtown skyline by Jason Hen...

Exelon CEO Sees Daily Cybersecurity Threats
Bloomberg

Exelon CEO Sees Daily Cybersecurity Threats

Exelon’s CEO warns of daily cybersecurity threats, highlighting persistent risks to the energy giant.

TechCrunch

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story

Ramp secured $750M at a $44B valuation, driven by AI integration and $1.5B+ revenue. The fintech firm now serves 70,000 ...

TechCrunch

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

Hello Robot’s Stretch avoids Silicon Valley hype, focusing on practical home deployment to gather essential real-world d...

Canada to Provide Funding, Buy Equity Stakes in AI Startups
Bloomberg

Canada to Provide Funding, Buy Equity Stakes in AI Startups

Canada will fund and buy equity stakes in AI startups to boost the sector. This investment aims to strengthen the nation...