arXiv

INTACT: Ego-Guided Typed Sparse Evidence Retrieval for Heterogeneous Collaborative Perception

Title: INTACT: Ego-Guided Typed Sparse Evidence Retrieval for Heterogeneous Collaborative Perception

Abstract:

Collaborative perception enhances the sensory horizon of autonomous vehicles by facilitating information exchange among agents. However, the presence of diverse sensors and varying perception models poses significant challenges to the large-scale deployment of intermediate feature fusion. Current heterogeneous collaboration approaches generally adhere to a translation-first strategy, requiring collaborator features to be aligned, adapted, or projected into a space compatible with the ego vehicle before fusion can occur. While these feature-compatibility contracts enhance performance in fixed systems, they tie deployment to specific collaborator adaptations, rendering the integration of newly joined heterogeneous agents costly and complex.

To bridge this gap, we introduce INTACT, a framework designed for heterogeneous collaborative perception that utilizes ego-guided typed sparse evidence retrieval. Rather than translating an entire collaborator feature map, INTACT empowers the ego vehicle to issue typed evidence queries targeting suspected objects and areas lacking sufficient evidence. Collaborators then provide only local evidence corresponding to the queried locations. The ego vehicle filters these responses via sparse per-query routing and integrates them using gated residual write-back. This approach shifts the compatibility requirement from global feature-map interpretability to local, typed response comparability under ego-issued queries. Consequently, it enables a zero-training heterogeneous insertion protocol, where the ego interface requires training only once, allowing new collaborators to join through simple checkpoint merging.

Extensive experiments conducted on both simulated and real-world heterogeneous collaborative perception benchmarks confirm the effectiveness and deployability of INTACT. On the OPV2V-H benchmark, INTACT achieves an AP70 of 80.1 with just 0.52 million additional parameters and a communication volume of 18.0 $\log_2$, representing approximately 16$\times$ compression compared to dense feature transmission. Furthermore, on the DAIR-V2X dataset, INTACT attains an AP50 of 43.8 under challenging real-world conditions.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

Meta’s Oversight Board says account bans lack due process, transparency

Meta’s Oversight Board criticized account bans for lacking due process and transparency, citing inconsistent enforcement...

Fed's Daly Says Forward Guidance Could Be Misleading
Bloomberg

Fed's Daly Says Forward Guidance Could Be Misleading

Fed’s Daly warns forward guidance may be misleading or lack clarity.

TechCrunch

Meta rolls out a new AI creator assistant on Facebook

Meta launched an AI creator assistant on Facebook to streamline analytics and content brainstorming. Initially available...

TechCrunch

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

WWDC 2026 promises a Siri revamp powered by Google’s Gemini and standalone app, plus AI agents in the App Store and Came...

TechCrunch

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

A thief stole yoga clothes using a Waymo, but police failed to catch them because the car’s video data was deleted and b...

Goldman Sachs CEO David Solomon on the Coming Mega IPOs
Bloomberg

Goldman Sachs CEO David Solomon on the Coming Mega IPOs

Goldman Sachs CEO David Solomon anticipates a surge in major IPOs, signaling renewed market confidence and significant o...