NeuroImage (Jul 2023)
Multimodal deep neural decoding reveals highly resolved spatiotemporal profile of visual object representation in humans
Abstract
Perception and categorization of objects in a visual scene are essential to grasp the surrounding situation. Recently, neural decoding schemes, such as machine learning in functional magnetic resonance imaging (fMRI), has been employed to elucidate the underlying neural mechanisms. However, it remains unclear as to how spatially distributed brain regions temporally represent visual object categories and sub-categories. One promising strategy to address this issue is neural decoding with concurrently obtained neural response data of high spatial and temporal resolution. In this study, we explored the spatial and temporal organization of visual object representations using concurrent fMRI and electroencephalography (EEG), combined with neural decoding using deep neural networks (DNNs). We hypothesized that neural decoding by multimodal neural data with DNN would show high classification performance in visual object categorization (faces or non-face objects) and sub-categorization within faces and objects. Visualization of the fMRI DNN was more sensitive than that in the univariate approach and revealed that visual categorization occurred in brain-wide regions. Interestingly, the EEG DNN valued the earlier phase of neural responses for categorization and the later phase of neural responses for sub-categorization. Combination of the two DNNs improved the classification performance for both categorization and sub-categorization compared with fMRI DNN or EEG DNN alone. These deep learning-based results demonstrate a categorization principle in which visual objects are represented in a spatially organized and coarse-to-fine manner, and provide strong evidence of the ability of multimodal deep learning to uncover spatiotemporal neural machinery in sensory processing.