arXiv

Tiny Collaborative Inference for Occlusion-Robust Object Detection

June 3, 2026 · Chieh-Tung Cheng, Mustafa Aslanov, Eiman Kanjo · Original Source

Title: Lightweight Collaborative Inference for Robust Object Detection Amidst Occlusion

Original: arXiv:2606.02894v1 Announce Type: new Abstract: Small edge devices such as IoT surveillance nodes and search-and-rescue (SAR) platforms are increasingly expected to run computer vision locally. On ultra-low-end hardware, however, object detection is limited by available memory and compute, by communication costs when several devices cooperate, and by the loss of accuracy caused by occlusion. The work evaluates occlusion-robust object detection on devices with less than 1 MB SRAM by combining an MCUNet backbone, a YOLOv2 detection head, and TensorFlow Lite quantisation. We evaluate two collaborative inference strategies: feature-level fusion, which concatenates intermediate feature maps, and decision-level fusion via Weighted Boxes Fusion (WBF). Under the tested occlusion settings, WBF outperforms feature-level fusion and gives gains of up to +0.2736 mAP in asymmetric occlusion scenarios. Extending fusion to three views improves accuracy further (up to +0.3827 mAP) while adding communication overhead (approximately 1.3 KB per exchange). The hardware experiments start with a host-assisted USB-relay baseline and then move to a Wi-Fi peer-to-peer deployment on two Coral Dev Board Micro units, where WBF runs on-device and communication energy remains small relative to inference. In a representative 301.9 s autonomous session comprising 108 frames, fused output is observed on 61 frames compared with 47 for Board 2 alone, a frame-level coverage gain of +29.8%. We also include a small exploratory decentralised federated learning (DFL) feasibility note, but do not treat it as a main result because performance remains limited under non-iid local data. The results support decision-level fusion as a viable option for improving occlusion robustness in small-scale edge object detection, including host-free multi-board operation on ultra-low-end hardware.

Rewritten: Title: Miniature Collaborative Inference Enhances Occlusion-Resilient Object Detection

Abstract: There is a growing expectation for compact edge devices, including Internet of Things (IoT) surveillance units and search-and-rescue (SAR) systems, to execute computer vision tasks independently. However, on extremely resource-constrained hardware, object detection faces significant hurdles, including restricted memory and processing power, the bandwidth expenses associated with multi-device cooperation, and accuracy degradation due to occlusion. This study investigates methods for achieving occlusion-robust object detection on devices equipped with under 1 MB of SRAM. The approach integrates an MCUNet backbone, a YOLOv2 detection head, and TensorFlow Lite quantisation. Two distinct collaborative inference models were assessed: feature-level fusion, which merges intermediate feature maps, and decision-level fusion utilizing Weighted Boxes Fusion (WBF). In scenarios involving asymmetric occlusion, WBF surpassed feature-level fusion, delivering accuracy improvements of up to +0.2736 mAP. Incorporating a third view into the fusion process yielded further accuracy boosts (reaching +0.3827 mAP), albeit with an increased communication burden of roughly 1.3 KB per exchange. Hardware testing began with a host-assisted USB-relay baseline before transitioning to a Wi-Fi peer-to-peer setup using two Coral Dev Board Micro devices. In this configuration, WBF executed on-device, with communication energy consumption remaining negligible compared to inference costs. During a 301.9-second autonomous trial involving 108 frames, the fused system produced valid outputs on 61 frames, compared to 47 frames for Board 2 operating independently, marking a +29.8% improvement in frame-level coverage. Additionally, the paper presents a brief feasibility assessment of decentralised federated learning (DFL); however, this is not highlighted as a primary finding due to suboptimal performance when local data is non-iid. Ultimately, the findings validate decision-level fusion as a practical strategy for enhancing occlusion resilience in edge-based object detection, enabling host-free, multi-board operations on ultra-low-end hardware.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC