IEEE Access (Jan 2021)
Parallelization of Non-Maximum Suppression
Abstract
Non-maximum suppression (NMS) is an unavoidable post-processing step in the object detection pipeline. NMS selects the bounding boxes with a locally maximum confidence score and eliminates its neighboring candidates which have a large overlap with the selected boxes. Because this procedure is a sequential and iterative algorithm of $O(N^{2})$ complexity, NMS running time is too slow to be applied to real-time object detection on the image which has many objects. To consider this issue, we propose a parallel computation method using GPU multi-cores to compute faster than the previous NMS. Our parallel NMS replicates the candidate boxes and performs both IoU calculation and comparison in parallel. We drastically reduced the complexity from $O(N^{2})$ to $O(N)$ and the time consumption of NMS to be applied to real-time detection with negligible degradation of detection performance and very slight additional memory consumption. Furthermore, when there is a small number of overlapped objects, our parallel NMS achieved an improvement in precision.
Keywords