Versatile Framework with Semantic and Structural guidance for Image Reconstruction from Brain Activity
Title: A Dual-Guidance Framework for Image Reconstruction from Neural Activity Using Semantic and Structural Cues
Abstract:
Reconstructing visual stimuli from neural recordings remains a pivotal yet formidable challenge within the field of brain decoding. The ability to achieve precise and controllable image reconstruction is crucial for advancing the development and practical application of brain-computer interfaces. While recent approaches have capitalized on the capabilities of text-to-image generation models to recreate complex natural stimuli with high semantic fidelityâaccurately capturing concepts and objectsâthey often fail to preserve fine-grained structural details such as position, orientation, and size. This deficiency compromises both the controllability and interpretability of these models.
To overcome these limitations, we introduce MindDiffuser, a novel two-stage image reconstruction framework. In the first stage, text embeddings derived from brain responses via Contrastive Language-Image Pretraining (CLIP) are fed into Stable Diffusion to produce a preliminary image that captures essential semantic content. The second stage focuses on structural alignment: decoded shallow CLIP visual features serve as supervisory signals, allowing for the iterative refinement of feature vectors through backpropagation to better match the original structural information.
We evaluated our framework using brain response datasets spanning three distinct modalitiesâfMRI, EEG, and MEGâelicited by visual stimuli. Our results demonstrate that MindDiffuser significantly outperforms existing state-of-the-art models, underscoring the robustness and versatility of our method. Furthermore, spatial and temporal visualizations confirm the neurobiological plausibility of the framework, offering valuable insights for future neural decoding research across various brain signal types.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




