One Channel to Rule Them All: Rethinking Input Representation for Visual Place Recognition
Title: One Channel to Rule Them All: Rethinking Input Representation for Visual Place Recognition
Abstract:
Visual Place Recognition (VPR) serves as a cornerstone for long-term robot localization and SLAM. However, contemporary systems predominantly depend on RGB imagery, operating under the implicit belief that color is essential for identifying locations globally. This study disputes that premise by examining the utility of chromatic data across various training methods, architectural designs, and standard benchmarks, particularly in scenarios involving significant real-world visual changes. Our results indicate that grayscale images generally match RGB performance and actually surpass it during drastic appearance shifts where models fail to learn sufficient color invariance. Color yields tangible benefits only when stable and distinctive chromatic features are available. On chosen benchmarks, a MixVPR model trained exclusively on grayscale data achieved an average Recall@1 of 82.4%, edging out its RGB equivalent, which scored 81.2%. Furthermore, certain lightweight grayscale configurations, boasting 60% fewer parameters, were able to outperform more complex RGB models. Beyond accuracy, grayscale presents practical benefits regarding storage efficiency, bandwidth usage, and compatibility with resource-limited hardware. We conclude that in global VPR contexts characterized by variations in lighting, weather, seasons, and environments, color plays a negligible role, and grayscale is entirely adequate for dependable place recognition.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





