Transfer Learning Models for CNN Fusion With Fisher Vector for Codebook Optimization of Foreground Features

Mohamed Gamal M. Kamaleldin; Syed A. R. Abu-Bakar; Usman Ullah Sheikh

doi:10.1109/ACCESS.2023.3339575

IEEE Access (Jan 2024)

Transfer Learning Models for CNN Fusion With Fisher Vector for Codebook Optimization of Foreground Features

Mohamed Gamal M. Kamaleldin,
Syed A. R. Abu-Bakar,
Usman Ullah Sheikh

Affiliations

Mohamed Gamal M. Kamaleldin: ORCiD; Electronics and Communications Engineering Department, Arab Academy for Science, Technology and Maritime Transport, Cairo, Egypt
Syed A. R. Abu-Bakar: ORCiD; Computer Vision, Video and Image Processing Research Laboratory, School of Electrical Engineering, Universiti Teknologi Malaysia, Skudai, Malaysia
Usman Ullah Sheikh: ORCiD; Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2023.3339575
Journal volume & issue: Vol. 12
pp. 5648 – 5658

Abstract

Read online

Human action recognition has become one of the main topics in the computer vision field due to its high demand and competitiveness in real-world applications. The main goals of human action recognition are to improve classification accuracy and reduce computational complexity. Previous studies have mainly used two approaches: the hand-crafted feature extraction approach and the deep learning approach. The hand-crafted approach is simple, which confers it with an added advantage in terms of computational complexity. However, this method is low in accuracy. Conversely, the deep learning approach achieves high accuracy even for complex datasets, but it suffers in terms of computational complexity and long training time as it needs to process huge datasets during training. Other approaches include the use of pre-trained deep learning networks to fuse both methods. In this paper, we will introduce a combination of pre-trained convolutional neural networks (CNN) to extract features, an improved Fisher vector (iFV) codebook, and an optimized support vector machine SVM to achieve improved human action recognition. We leveraged three pre-trained CNNs, namely, Inception-ResNet-v2, NASNet-Large, and Xception, to extract the features. Then, we applied the improved Fisher vector codebook to encode them. We subsequently trained the codebook using SVM for classification and re- adjusted the SVM weights using five different optimization techniques, which are SGD, Adadelta, ADAM, Adamax, and Nadam. To evaluate the performance, we utilized UCF101 and HMDB51 datasets. The results demonstrate that the accuracy and computational complexity of our approach are comparable to state-of-the-art techniques.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords