INTACT: Ego-Guided Typed Sparse Evidence Retrieval for Heterogeneous Collaborative Perception
Title: INTACT: Ego-Guided Typed Sparse Evidence Retrieval for Heterogeneous Collaborative Perception
Abstract:
Collaborative perception enhances the sensory horizon of autonomous vehicles by facilitating information exchange among agents. However, the presence of diverse sensors and varying perception models poses significant challenges to the large-scale deployment of intermediate feature fusion. Current heterogeneous collaboration approaches generally adhere to a translation-first strategy, requiring collaborator features to be aligned, adapted, or projected into a space compatible with the ego vehicle before fusion can occur. While these feature-compatibility contracts enhance performance in fixed systems, they tie deployment to specific collaborator adaptations, rendering the integration of newly joined heterogeneous agents costly and complex.
To bridge this gap, we introduce INTACT, a framework designed for heterogeneous collaborative perception that utilizes ego-guided typed sparse evidence retrieval. Rather than translating an entire collaborator feature map, INTACT empowers the ego vehicle to issue typed evidence queries targeting suspected objects and areas lacking sufficient evidence. Collaborators then provide only local evidence corresponding to the queried locations. The ego vehicle filters these responses via sparse per-query routing and integrates them using gated residual write-back. This approach shifts the compatibility requirement from global feature-map interpretability to local, typed response comparability under ego-issued queries. Consequently, it enables a zero-training heterogeneous insertion protocol, where the ego interface requires training only once, allowing new collaborators to join through simple checkpoint merging.
Extensive experiments conducted on both simulated and real-world heterogeneous collaborative perception benchmarks confirm the effectiveness and deployability of INTACT. On the OPV2V-H benchmark, INTACT achieves an AP70 of 80.1 with just 0.52 million additional parameters and a communication volume of 18.0 $\log_2$, representing approximately 16$\times$ compression compared to dense feature transmission. Furthermore, on the DAIR-V2X dataset, INTACT attains an AP50 of 43.8 under challenging real-world conditions.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC


