Evaluation and Comparison of Semantic Segmentation Networks for Rice Identification Based on Sentinel-2 Imagery

Huiyao Xu; Jia Song; Yunqiang Zhu

doi:10.3390/rs15061499

Remote Sensing (Mar 2023)

Evaluation and Comparison of Semantic Segmentation Networks for Rice Identification Based on Sentinel-2 Imagery

Huiyao Xu,
Jia Song,
Yunqiang Zhu

Affiliations

Huiyao Xu: State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources, Chinese Academy of Sciences, Beijing 100101, China
Jia Song: State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources, Chinese Academy of Sciences, Beijing 100101, China
Yunqiang Zhu: State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources, Chinese Academy of Sciences, Beijing 100101, China

DOI: https://doi.org/10.3390/rs15061499
Journal volume & issue: Vol. 15, no. 6
p. 1499

Abstract

Read online

Efficient and accurate rice identification based on high spatial and temporal resolution remote sensing imagery is essential for achieving precision agriculture and ensuring food security. Semantic segmentation networks in deep learning are an effective solution for crop identification, and they are mainly based on two architectures: the commonly used convolutional neural network (CNN) architecture and the novel Vision Transformer architecture. Research on crop identification from remote sensing imagery using Vision Transformer has only emerged in recent times, mostly in sub-meter resolution or even higher resolution imagery. Sub-meter resolution images are not suitable for large scale crop identification as they are difficult to obtain. Therefore, studying and analyzing the differences between Vision Transformer and CNN in crop identification in the meter resolution images can validate the generalizability of Vision Transformer and provide new ideas for model selection in crop identification research at large scale. This paper compares the performance of two representative CNN networks (U-Net and DeepLab v3) and a novel Vision Transformer network (Swin Transformer) on rice identification in Sentinel-2 of 10 m resolution. The results show that the three networks have different characteristics: (1) Swin Transformer has the highest rice identification accuracy and good farmland boundary segmentation ability. Although Swin Transformer has the largest number of model parameters, the training time is shorter than DeepLab v3, indicating that Swin Transformer has good computational efficiency. (2) DeepLab v3 also has good accuracy in rice identification. However, the boundaries of the rice fields identified by DeepLab v3 tend to shift towards the upper left corner. (3) U-Net takes the shortest time for both training and prediction and is able to segment the farmland boundaries accurately for correctly identified rice fields. However, U-Net’s accuracy of rice identification is lowest, and rice is easily confused with soybean, corn, sweet potato and cotton in the prediction. The results reveal that the Vision Transformer network has great potential for identifying crops at the country or even global scale.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords