3D Object Detection Algorithm for Panoramic Images With Multi-Scale Convolutional Neural Network

Dianwei Wang; Yanhui He; Ying Liu; Daxiang Li; Shiqian Wu; Yongrui Qin; Zhijie Xu

doi:10.1109/ACCESS.2019.2955995

IEEE Access (Jan 2019)

3D Object Detection Algorithm for Panoramic Images With Multi-Scale Convolutional Neural Network

Dianwei Wang,
Yanhui He,
Ying Liu,
Daxiang Li,
Shiqian Wu,
Yongrui Qin,
Zhijie Xu

Affiliations

Dianwei Wang: ORCiD; Center for Image and Information Processing, School of Communication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an, China
Yanhui He: ORCiD; Center for Image and Information Processing, School of Communication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an, China
Ying Liu: ORCiD; Center for Image and Information Processing, School of Communication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an, China
Daxiang Li: ORCiD; Center for Image and Information Processing, School of Communication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an, China
Shiqian Wu: ORCiD; School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan, China
Yongrui Qin: ORCiD; School of Computing and Engineering, University of Huddersfield, Huddersfield, U.K.
Zhijie Xu: ORCiD; School of Computing and Engineering, University of Huddersfield, Huddersfield, U.K.

DOI: https://doi.org/10.1109/ACCESS.2019.2955995
Journal volume & issue: Vol. 7
pp. 171461 – 171470

Abstract

Read online

This paper addresses the challenge of 3D object detection from a single panoramic image under severe deformation. The advent of the two-stage approach has impelled significant progress in 3D object detection. However, most available methods only can localize region proposals by a single-scale architecture network, which are sensitive to deformation and distortion. To address this issue, we propose a multi-scale convolutional neural network (MSCNN) to estimate the 3D pose of an object. To be specific, the proposed MSCNN consists of three steps for effectively detecting the distorted object on the panoramic images. The MSCNN contains the CycleGAN network that converts rectilinear images into panoramas, a fused framework that improves both accuracy and speed for object detection, and an adversarial spatial transformer network (ASTN) that extracts the deformation features of the object on panoramic images. Additionally, we recover the 3D pose of the object using a coordinate projection and a 3D bounding box. Extensive experiments demonstrate that the proposed method can achieve a 3D detection accuracy of 38.7% in high-resolution panoramic images, which is higher than the current state-of-the-art algorithm of 5.2%. Moreover, the speed of detection is only about 0.6 seconds per image, which is six times faster than Faster R-CNN (COCO). The code will be available at https://github.com/Yanhui-He.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords