Engineering Reports (Dec 2024)

FishAI: Automated hierarchical marine fish image classification with vision transformer

  • Chenghan Yang,
  • Peng Zhou,
  • Chun‐Sheng Wang,
  • Ge‐Yi Fu,
  • Xue‐Wei Xu,
  • Zhibin Niu,
  • Lin Zhu,
  • Ye Yuan,
  • Hong‐Bin Shen,
  • Xiaoyong Pan

DOI
https://doi.org/10.1002/eng2.12992
Journal volume & issue
Vol. 6, no. 12
pp. n/a – n/a

Abstract

Read online

Abstract To address the issues of high demand for efficiently recognizing fish species in marine scientific research, such as impact assessments on biodiversity and monitoring, an automated hierarchical image classification web‐based platform, named FishAI, was developed. Trained with marine fish images collected from the World Register of Marine Species, FishAI used the Vision Transformer (ViT) model, to classify fish. The model considers hierarchy levels, covering 3 classes, 38 orders, 154 families, 438 genera, and 808 species. The FishAI achieved accuracies of 0.975 (Class), 0.798 (Order), 0.743 (Family), 0.638 (Genus), and 0.626 (Species) on test images, respectively, by using the hyperparameter optimization. Comparison between ViT and other baseline backbones proves its superiority by capturing long‐distance dependency. In addition, FishAI yields the top‐5 prediction accuracies of 1.000 (Class), 0.887 (Order), 0.816 (Family), 0.729 (Genus), and 0.727 (Species), respectively. In order to further enhance the practicality of FishAI, the user‐friendly graphic interface (http://www.csbio.sjtu.edu.cn/bioinf/FishAI/) facilitates its easy‐to‐use application. Furthermore, interpretability analysis by Grad‐CAM provides a visual explanation of the highlighted regions on the images for FishAI's prediction among different hierarchies.

Keywords