IEEE Access (Jan 2024)
MMW-AQA: Multimodal In-the-Wild Dataset for Action Quality Assessment
Abstract
Action quality assessment (AQA) is a task for assessing a specific action quality in videos. Since existing AQA datasets provide only two-dimensional (2D) video data captured from fewer viewpoints, existing AQA methods based on deep neural networks (DNNs) often struggle to assess complex three-dimensional (3D) actions accurately, and their robustness against diversified viewpoints remains unknown. We created a dataset called multimodal in-the-wild (MMW)-AQA in freestyle windsurfing that addresses these concerns. In addition to video data, MMW-AQA provides inertial measurement unit (IMU) and global positioning system (GPS) data. The 3D information of IMU data helps DNNs accurately assess complex 3D actions. Moreover, MMW-AQA provides wild video data captured by a single unmanned aerial vehicle (UAV). These wild video data enable us to evaluate whether AQA methods can work well on diversified viewpoints. Furthermore, we also present the baseline multimodalization framework with a transformer-based fusion module. These frameworks multimodalize existing unimodal DNN models easily to assess action quality using multimodal data. Our experimental results demonstrate that multimodal data improves the AQA accuracy compared with unimodal video data.
Keywords