SATSal: A Multi-Level Self-Attention Based Architecture for Visual Saliency Prediction

Marouane Tliba; Mohamed A. Kerkouri; Bashir Ghariba; Aladine Chetouani; Arzu Coltekin; Mohamed Sami Shehata; Alessandro Bruno

doi:10.1109/ACCESS.2022.3152189

IEEE Access (Jan 2022)

SATSal: A Multi-Level Self-Attention Based Architecture for Visual Saliency Prediction

Marouane Tliba,
Mohamed A. Kerkouri,
Bashir Ghariba,
Aladine Chetouani,
Arzu Coltekin,
Mohamed Sami Shehata,
Alessandro Bruno

Affiliations

Marouane Tliba: Laboratoire PRISME, Université d’Orleans, Orleans, France
Mohamed A. Kerkouri: Laboratoire PRISME, Université d’Orleans, Orleans, France
Bashir Ghariba: Department of Electrical and Computer Engineering, Faculty of Engineering, Elmergib University, Khoms, Libya
Aladine Chetouani: ORCiD; Laboratoire PRISME, Université d’Orleans, Orleans, France
Arzu Coltekin: Institute of Interactive Technologies, University of Applied Sciences and Arts Northwestern Switzerland, Windisch, Switzerland
Mohamed Sami Shehata: ORCiD; Department of Computer Science, The University of British Columbia, Kelowna, BC, Canada
Alessandro Bruno: ORCiD; Department of Computing and Informatics, Faculty of Science and Technology, Bournemouth University, Poole, U.K

DOI: https://doi.org/10.1109/ACCESS.2022.3152189
Journal volume & issue: Vol. 10
pp. 20701 – 20713

Abstract

Read online

Human visual Attention modelling is a persistent interdisciplinary research challenge, gaining new interest in recent years mainly due to the latest developments in deep learning. That is particularly evident in saliency benchmarks. Novel deep learning-based visual saliency models show promising results in capturing high-level (top-down) human visual attention processes. Therefore, they strongly differ from the earlier approaches, mainly characterised by low-level (bottom-up) visual features. These developments account for innate human selectivity mechanisms that are reliant on both high- and low-level factors. Moreover, the two factors interact with each other. Motivated by the importance of these interactions, in this project, we tackle visual saliency modelling holistically, examining if we could consider both high- and low-level features that govern human attention. Specifically, we propose a novel method SAtSal (Self-Attention Saliency). SAtSal leverages both high and low-level features using a multilevel merging of skip connections during the decoding stage. Consequently, we incorporate convolutional self-attention modules on skip connection from the encoder to the decoder network to properly integrate the valuable signals from multilevel spatial features. Thus, the self-attention modules learn to filter out the latent representation of the salient regions from the other irrelevant information in an embedded and joint manner with the main encoder-decoder model backbone. Finally, we evaluate SAtSal against various existing solutions to validate our approach, using the well-known standard saliency benchmark MIT300. To further examine SAtSal’s robustness on other image types, we also evaluate it on the Le-Meur saliency painting benchmark.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords