arXiv

INTACT: Ego-Guided Typed Sparse Evidence Retrieval for Heterogeneous Collaborative Perception

June 4, 2026 · Chen Li, Shengrong Yuan, Jialong Zuo, Xinzhong Zhu, Nong Sang, Changxin Gao · Original Source

Title: INTACT: Ego-Guided Typed Sparse Evidence Retrieval for Heterogeneous Collaborative Perception

Abstract:

Collaborative perception enhances the sensory horizon of autonomous vehicles by facilitating information exchange among agents. However, the presence of diverse sensors and varying perception models poses significant challenges to the large-scale deployment of intermediate feature fusion. Current heterogeneous collaboration approaches generally adhere to a translation-first strategy, requiring collaborator features to be aligned, adapted, or projected into a space compatible with the ego vehicle before fusion can occur. While these feature-compatibility contracts enhance performance in fixed systems, they tie deployment to specific collaborator adaptations, rendering the integration of newly joined heterogeneous agents costly and complex.

To bridge this gap, we introduce INTACT, a framework designed for heterogeneous collaborative perception that utilizes ego-guided typed sparse evidence retrieval. Rather than translating an entire collaborator feature map, INTACT empowers the ego vehicle to issue typed evidence queries targeting suspected objects and areas lacking sufficient evidence. Collaborators then provide only local evidence corresponding to the queried locations. The ego vehicle filters these responses via sparse per-query routing and integrates them using gated residual write-back. This approach shifts the compatibility requirement from global feature-map interpretability to local, typed response comparability under ego-issued queries. Consequently, it enables a zero-training heterogeneous insertion protocol, where the ego interface requires training only once, allowing new collaborators to join through simple checkpoint merging.

Extensive experiments conducted on both simulated and real-world heterogeneous collaborative perception benchmarks confirm the effectiveness and deployability of INTACT. On the OPV2V-H benchmark, INTACT achieves an AP70 of 80.1 with just 0.52 million additional parameters and a communication volume of 18.0 $\log_2$, representing approximately 16$\times$ compression compared to dense feature transmission. Furthermore, on the DAIR-V2X dataset, INTACT attains an AP50 of 43.8 under challenging real-world conditions.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Top international news

INTACT: Ego-Guided Typed Sparse Evidence Retrieval for Heterogeneous Collaborative Perception

Related Articles

Meta’s Oversight Board says account bans lack due process, transparency

Fed's Daly Says Forward Guidance Could Be Misleading

Meta rolls out a new AI creator assistant on Facebook

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

Goldman Sachs CEO David Solomon on the Coming Mega IPOs