Deep Psychovisual Image Representations
Title: Deep Psychovisual Image Representations
Abstract:
Human visual processing is theorized to separate low-level feature extraction from higher-order cognition by generating intermediate abstractions, according to psychovisual models. This stands in stark contrast to conventional deep learning architectures, which typically rely on uniform stacks of spatial layers to extract and aggregate features, resulting in black-box decision-making mechanisms. To address this, we introduce Deep Visual Coding, a learned frequency-domain representation rooted in 1990s image coding techniques that quantized perceptually significant frequencies. When combined with complex-valued image representations, this method generates abstractions akin to those found in psychovisual theory.
Our framework represents the first deep learning system based on psychovisual principles, employing data-driven spectral filters that identify and encode task-specific semantic structures within separate frequency sub-bands. Our analysis shows that these psychovisual models yield highly interpretable object parts, whereas standard Convolutional Neural Networks (CNNs) tend to produce indistinct, amorphous regions. Additionally, we observe that our models exhibit lower sensitivity to depth compared to CNNs when scaling up; this is because our complex-valued representations and learned abstractions effectively replace the function of deep spatial layers. Collectively, these results indicate that psychovisual coding offers a viable route toward developing vision models that are both more transparent and efficient.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




