DUAL PYRAMIDS ENCODER-DECODER NETWORK FOR SEMANTIC SEGMENTATION IN GROUND AND AERIAL VIEW IMAGES

S. L. Jiang; S. L. Jiang; G. Li; W. Yao; W. Yao; Z. H. Hong; T. Y. Kuc

doi:10.5194/isprs-archives-XLIII-B2-2020-605-2020

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences (Aug 2020)

DUAL PYRAMIDS ENCODER-DECODER NETWORK FOR SEMANTIC SEGMENTATION IN GROUND AND AERIAL VIEW IMAGES

S. L. Jiang,
S. L. Jiang,
G. Li,
W. Yao,
W. Yao,
Z. H. Hong,
T. Y. Kuc

Affiliations

S. L. Jiang: Department of Land Surveying and Geo-informatics, The Hongkong Polytechnic University, Hong Kong
S. L. Jiang: College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea
G. Li: College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea
W. Yao: Department of Land Surveying and Geo-informatics, The Hongkong Polytechnic University, Hong Kong
W. Yao: Research Institute for Sustainable Urban Development, The Hong Kong Polytechnic University, Hong Kong
Z. H. Hong: College of Information Technology, Shanghai Ocean University, Shanghai, China
T. Y. Kuc: College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea

DOI: https://doi.org/10.5194/isprs-archives-XLIII-B2-2020-605-2020
Journal volume & issue: Vol. XLIII-B2-2020
pp. 605 – 610

Abstract

Read online

Semantic segmentation is a fundamental research task in computer vision, which intends to assign a certain category to every pixel. Currently, most existing methods only utilize the deepest feature map for decoding, while high-level features get inevitably lost during the procedure of down-sampling. In the decoder section, transposed convolution or bilinear interpolation was widely used to restore the size of the encoded feature map; however, few optimizations are applied during up-sampling process which is detrimental to the performance for grouping and classification. In this work, we proposed a dual pyramids encoder-decoder deep neural network (DPEDNet) to tackle the above issues. The first pyramid integrated and encoded multi-resolution features through sequentially stacked merging, and the second pyramid decoded the features through dense atrous convolution with chained up-sampling. Without post-processing and multi-scale testing, the proposed network has achieved state-of-the-art performances on two challenging benchmark image datasets for both ground and aerial view scenes.

Published in The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISSN: 1682-1750 (Print); 2194-9034 (Online)
Publisher: Copernicus Publications
Country of publisher: Germany
LCC subjects: Technology: Engineering (General). Civil engineering (General): Applied optics. Photonics
Website: http://www.isprs.org/publications/archives.aspx

About the journal