Towards Compact Autonomous Driving Perception with Balanced Learning and Multi-sensor Fusion
Title: Advancing Compact Autonomous Driving Perception via Balanced Learning and Multi-Sensor Fusion
This study introduces a streamlined deep multi-task learning architecture designed to execute a wide array of autonomous driving perception tasks within a single forward pass. Rather than relying on a suite of separate models, this system simultaneously generates multiple outputs, including semantic segmentation from various viewpoints, depth estimation, LiDAR segmentation, and bird’s eye view projections. To address the challenges of imbalanced learning inherent in managing numerous tasks, the authors propose an adaptive loss weighting algorithm.
The model leverages data pre-processing and intermediate sensor fusion techniques to integrate inputs from multiple modalities. These inputs are gathered from RGB cameras, dynamic vision sensors (DVS), and LiDAR units positioned at various locations on the ego vehicle, enabling a more comprehensive understanding of dynamic environments.
Ablation studies demonstrate that the model variant trained using the proposed method yields superior performance. Additionally, a comparative analysis highlights its effectiveness against combinations of recent state-of-the-art models. Notably, the proposed architecture maintains high performance despite having significantly fewer parameters, which allows for faster inference times and reduced GPU memory consumption. The results remain consistent across three distinct CARLA simulation datasets and one real-world nuScenes-lidarseg dataset. To facilitate further research, the code and associated resources have been made publicly available at https://github.com/oskarnatan/compact-perception.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



