VisualSAF-A Novel Framework for Visual Semantic Analysis Tasks

Antonio V. A. Lundgren; Byron L. D. Bezerra; Carmelo J. A. Bastos-Filho

doi:10.1109/ACCESS.2025.3535314

IEEE Access (Jan 2025)

VisualSAF-A Novel Framework for Visual Semantic Analysis Tasks

Antonio V. A. Lundgren,
Byron L. D. Bezerra,
Carmelo J. A. Bastos-Filho

Affiliations

Antonio V. A. Lundgren: ORCiD; Department of Computer Engineering (Ecomp), Polytechnic School of Pernambuco (POLI), University of Pernambuco (UPE), Recife, Brazil
Byron L. D. Bezerra: Department of Computer Engineering (Ecomp), Polytechnic School of Pernambuco (POLI), University of Pernambuco (UPE), Recife, Brazil
Carmelo J. A. Bastos-Filho: ORCiD; Department of Computer Engineering (Ecomp), Polytechnic School of Pernambuco (POLI), University of Pernambuco (UPE), Recife, Brazil

DOI: https://doi.org/10.1109/ACCESS.2025.3535314
Journal volume & issue: Vol. 13
pp. 21052 – 21063

Abstract

Read online

We introduce VisualSAF, a novel Visual Semantic Analysis Framework designed to enhance the understanding of contextual characteristics in Visual Scene Analysis (VSA) tasks. The framework leverages semantic variables extracted using machine learning algorithms to provide additional high-level information, augmenting the capabilities of the primary task model. Comprising three main components – the General DL Model, Semantic Variables, and Output Branches – VisualSAF offers a modular and adaptable approach to addressing diverse VSA tasks. The General DL Model processes input images, extracting high-level features through a backbone network and detecting regions of interest. Semantic Variables are then extracted from these regions, incorporating a wide range of contextual information tailored to specific scenarios. Finally, the Output Branch integrates semantic variables and detections, generating high-level task information while allowing for flexible weighting of inputs to optimize task performance. The framework is demonstrated through experiments on the HOD Dataset, showcasing improvements in mean average precision and mean average recall compared to baseline models; the improvements are 0.05 in both mAP and 0.01 in mAR compared to the baseline. Future research directions include exploring multiple semantic variables, developing more complex output heads, and investigating the framework’s performance across context-shifting datasets.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords