Abstract Schizophrenia (SCHZ) notably impacts various human perceptual modalities, including vision. Prior research has identified marked abnormalities in perceptual organization in SCHZ, predominantly attributed to deficits in bottom-up processing. Our study introduces a novel paradigm to differentiate the roles of top-down and bottom-up processes in visual perception in SCHZ. We analysed eye-tracking fixation ground truth maps from 28 SCHZ patients and 25 healthy controls (HC), comparing these with two mathematical models of visual saliency: one bottom-up, based on the physical attributes of images, and the other top-down, incorporating machine learning. While the bottom-up (GBVS) model revealed no significant overall differences between groups (beta = 0.01, p = 0.281, with a marginal increase in SCHZ patients), it did show enhanced performance by SCHZ patients with highly salient images. Conversely, the top-down (EML-Net) model indicated no general group difference (beta = −0.03, p = 0.206, lower in SCHZ patients) but highlighted significantly reduced performance in SCHZ patients for images depicting social interactions (beta = −0.06, p < 0.001). Over time, the disparity between the groups diminished for both models. The previously reported bottom-up bias in SCHZ patients was apparent only during the initial stages of visual exploration and corresponded with progressively shorter fixation durations in this group. Our research proposes an innovative approach to understanding early visual information processing in SCHZ patients, shedding light on the interplay between bottom-up perception and top-down cognition.