FOVI: A biologically-inspired foveated interface for deep vision models
Title: FOVI: A Biologically Inspired Foveated Interface for Deep Vision Models
Abstract:
The human visual system operates with foveation, featuring a resolution that peaks at the center of a broad field of view. This mechanism represents an efficient compromise for active sensing, enabling eye movements to shift focus onto specific elements while maintaining contextual awareness of the surrounding environment. Conversely, traditional computer vision systems typically encode the visual scene at a consistent resolution across the entire image, which creates significant difficulties when attempting to process high-resolution, full-field imagery efficiently.
To address this, we introduce the Foveated Vision Interface (FOVI), a system modeled after the human retina and the primary visual cortex (V1). FOVI transforms the data from a variable-resolution, retina-inspired sensor array into a uniformly dense, V1-like sensor manifold. Within this manifold, receptive fields are established as k-nearest-neighborhoods (kNNs), facilitating kNN-convolution through a newly developed kernel mapping technique.
We validate the approach through two primary applications: an end-to-end kNN-convolutional architecture and a foveated modification of the DINOv3 Vision Transformer (ViT) foundation model, which incorporates low-rank adaptation (LoRA). These models achieve performance levels comparable to standard baselines while requiring significantly fewer pixels and less computational power than full-resolution, non-foveated systems. This advancement paves the way for scalable and efficient active sensing in high-resolution egocentric vision tasks.
The source code is publicly available at https://github.com/nblauch/fovi, and pre-trained models can be accessed via https://huggingface.co/fovi-pytorch.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





