IEEE Access (Jan 2024)

An Efficient Intersection Over Union Algorithm for 3D Object Detection

  • Sazan Ali Kamal Mohammed,
  • Mohd Zulhakimi Ab Razak,
  • Abdul Hadi Abd Rahman,
  • Maria Abu Bakar

DOI
https://doi.org/10.1109/ACCESS.2024.3495761
Journal volume & issue
Vol. 12
pp. 169768 – 169786

Abstract

Read online

Quite recently, the combination of improved accuracy, flexibility, available datasets, state-of-the-art architectures, and diverse applications has contributed to the widespread adoption and popularity of deep learning-based object detection techniques, particularly for many applications in computer vision and emerging applications. An important metric in this discipline is the Intersection over Union (IoU) loss function, which is common and extensively being used in the boundary analysis. The majority of enhanced loss functions often alter the penalty terms when solving the zero-gradient issue of IoU in non-overlapping situations. Therefore, in this study, we first analyze the existing IoU loss functions in order to propose an efficient intersection over union (EIoU) algorithm. The proposed EIoU is capable of providing the gradient value even though the bounded predicted box and the ground truth target box do not overlap. In this case, we propose a steady optimization procedure (SOP) for the EIoU loss to gradually approach the minimum value. In addition, we also propose an angle extension of the EIoU algorithm as a significant refinement tool. The EIoU loss function is inversely correlated with the size of the bounded boxes and responds quickly to a small object size, which favors the detection of small objects in a 3D point cloud, where fewer iterations are required to make the bounded box regress well. The proposed method records error values at least 78% better than other types of IoUs. Similarly, EIoU reaches 72.3%, 17.4%, and 85.5% higher values for intersection, separation, and inclusion location conditions, respectively, as compared to other IoU types, indicating that EIoU is sensitive to the change in boxes dimensions and hence, leads to fast convergence between the predicted and ground truth boxes. Experimental results on the KITTI datasets for 3D object detection achieves 11.5% and 12.7% improvement for SECOND+EIoU test condition as compared SECOND only and at least 8% and 6.9% better than other the state-of-the-art 3D methods for moderate car and pedestrian cases, respectively, which proved that our proposed techniques have significantly improved the precision accuracy of learning and the performance of 3D object detection.

Keywords