IEEE Access (Jan 2024)

Navigating the YOLO Landscape: A Comparative Study of Object Detection Models for Emotion Recognition

  • Medha Mohan Ambali Parambil,
  • Luqman Ali,
  • Muhammed Swavaf,
  • Salah Bouktif,
  • Munkhjargal Gochoo,
  • Hamad Aljassmi,
  • Fady Alnajjar

DOI
https://doi.org/10.1109/ACCESS.2024.3439346
Journal volume & issue
Vol. 12
pp. 109427 – 109442

Abstract

Read online

The You Only Look Once (YOLO) series, renowned for its efficiency and versatility in object detection, has become a fundamental component in diverse fields ranging from autonomous vehicles to robotics and video surveillance. Despite its widespread application, a notable gap exists in the literature concerning selecting YOLO models for specific tasks. Current trends often lean towards the latest models, potentially overlooking crucial factors such as computational complexity, speed, accuracy, model size, adaptability, and generalization. This approach may not always yield the optimal choice for a given application. Therefore, this paper aims to provide an exhaustive comparative analysis of various YOLO models, focusing on emotion recognition. We trained and tested YOLOv5, YOLOv7, YOLOv8, and YOLOv9 along with their respective variants, using a subset of AffectNet dataset, which consists of facial images annotated with one of five emotions, namely angry, happy, sad, neutral, and surprise. The study evaluates the models based on several key parameters: accuracy using metrics like mean Average Precision (mAP), inference time, FPS, model size, adaptability to altered datasets, and generalization capability. Comprehensive results are presented, highlighting the strengths and limitations of each model variant across these parameters. Insights are provided to guide researchers in selecting the most suitable YOLO architecture for their specific emotion recognition requirements, considering factors such as computational constraints, real-time performance needs, and the importance of accuracy vs efficiency tradeoffs. The analysis reveals the exceptional performances of certain models like YOLOv9e for high accuracy and YOLOv8n for balancing speed and accuracy. Overall, this work fills a crucial gap by offering a detailed comparative study to facilitate informed decision-making when deploying YOLO for facial emotion recognition tasks.

Keywords