Multiple-Clothing Detection and Fashion Landmark Estimation Using a Single-Stage Detector

Hyo Jin Kim; Doo Hee Lee; Asim Niaz; Chan Yong Kim; Asif Aziz Memon; Kwang Nam Choi

doi:10.1109/ACCESS.2021.3051424

IEEE Access (Jan 2021)

Multiple-Clothing Detection and Fashion Landmark Estimation Using a Single-Stage Detector

Hyo Jin Kim,
Doo Hee Lee,
Asim Niaz,
Chan Yong Kim,
Asif Aziz Memon,
Kwang Nam Choi

Affiliations

Hyo Jin Kim: ORCiD; Department of Computer Science and Engineering, Chung-Ang University, Seoul, South Korea
Doo Hee Lee: ORCiD; Department of Computer Science and Engineering, Chung-Ang University, Seoul, South Korea
Asim Niaz: ORCiD; STARS Team, INRIA Sophia Antipolis, Sophia Antipolis, France
Chan Yong Kim: ORCiD; Department of Computer Science and Engineering, Chung-Ang University, Seoul, South Korea
Asif Aziz Memon: ORCiD; Department of Computer Science and Engineering, Chung-Ang University, Seoul, South Korea
Kwang Nam Choi: ORCiD; Department of Computer Science and Engineering, Chung-Ang University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2021.3051424
Journal volume & issue: Vol. 9
pp. 11694 – 11704

Abstract

Read online

Fashion image analysis has attracted significant research attention owing to the availability of large-scale fashion datasets with rich annotations. However, existing deep learning models for fashion datasets often have high computational requirements. In this study, we propose a new model suitable for low-power devices. The proposed network is a one-stage detector that rapidly detects multiple cloths and landmarks in fashion images. The network is designed as a modification of the EfficientDet originally proposed by Google Brain. The proposed network simultaneously trains the core input features with different resolutions and applies compound scaling to the backbone feature network. The bounding box/class/landmark prediction networks maintain the balance between the speed and accuracy. Moreover, a low number of parameters and low computational cost make it efficient. Without image preprocessing, we achieved 0.686 mean average precision (mAP) in the bounding box detection and 0.450 mAP in the landmark estimation on the DeepFashion2 validation dataset with an inference time of 42 ms. We obtained optimal results in extensive experiments with loss functions and optimizers. Furthermore, the proposed method has the advantage of operating in low-power devices.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords