CFFI-Vit: Enhanced Vision Transformer for the Accurate Classification of Fish Feeding Intensity in Aquaculture

Jintao Liu; Alfredo Tolón Becerra; José Fernando Bienvenido-Barcena; Xinting Yang; Zhenxi Zhao; Chao Zhou

doi:10.3390/jmse12071132

Journal of Marine Science and Engineering (Jul 2024)

CFFI-Vit: Enhanced Vision Transformer for the Accurate Classification of Fish Feeding Intensity in Aquaculture

Jintao Liu,
Alfredo Tolón Becerra,
José Fernando Bienvenido-Barcena,
Xinting Yang,
Zhenxi Zhao,
Chao Zhou

Affiliations

Jintao Liu: School of Engineering, University of Almeria, 04120 Almeria, Spain
Alfredo Tolón Becerra: School of Engineering, University of Almeria, 04120 Almeria, Spain
José Fernando Bienvenido-Barcena: School of Engineering, University of Almeria, 04120 Almeria, Spain
Xinting Yang: National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
Zhenxi Zhao: National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
Chao Zhou: National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China

DOI: https://doi.org/10.3390/jmse12071132
Journal volume & issue: Vol. 12, no. 7
p. 1132

Abstract

Read online

The real-time classification of fish feeding behavior plays a crucial role in aquaculture, which is closely related to feeding cost and environmental preservation. In this paper, a Fish Feeding Intensity classification model based on the improved Vision Transformer (CFFI-Vit) is proposed, which is capable of quantifying the feeding behaviors of rainbow trout (Oncorhynchus mykiss) into three intensities: strong, moderate, and weak. The process is outlined as follows: firstly, we obtained 2685 raw feeding images of rainbow trout from recorded videos and classified them into three categories: strong, moderate, and weak. Secondly, the number of transformer encoder blocks in the internal structure of the ViT was reduced from 12 to 4, which can greatly reduce the computational load of the model, facilitating its deployment on mobile devices. And finally, a residual module was added to the head of the ViT, enhancing the model’s ability to extract features. The proposed CFFI-Vit has a computational load of 5.81 G (Giga) Floating Point Operations per Second (FLOPs). Compared to the original ViT model, it reduces computational demands by 65.54% and improves classification accuracy on the validation set by 5.4 percentage points. On the test set, the model achieves precision, recall, and F1 score of 93.47%, 93.44%, and 93.42%, respectively. Additionally, compared to state-of-the-art models such as ResNet34, MobileNetv2, VGG16, and GoogLeNet, the CFFI-Vit model’s classification accuracy is higher by 6.87, 8.43, 7.03, and 5.65 percentage points, respectively. Therefore, the proposed CFFI-Vit can achieve higher classification accuracy while significantly reducing computational demands. This provides a foundation for deploying lightweight deep network models on edge devices with limited hardware capabilities.

Published in Journal of Marine Science and Engineering

ISSN: 2077-1312 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Naval Science: Naval architecture. Shipbuilding. Marine engineering; Geography. Anthropology. Recreation: Oceanography
Website: http://www.mdpi.com/journal/jmse

About the journal

Abstract

Keywords