A Scalable Machine Learning Pipeline for Paddy Rice Classification Using Multi-Temporal Sentinel Data

Vasileios Sitokonstantinou; Alkiviadis Koukos; Thanassis Drivas; Charalampos Kontoes; Ioannis Papoutsis; Vassilia Karathanassi

doi:10.3390/rs13091769

Remote Sensing (May 2021)

A Scalable Machine Learning Pipeline for Paddy Rice Classification Using Multi-Temporal Sentinel Data

Vasileios Sitokonstantinou,
Alkiviadis Koukos,
Thanassis Drivas,
Charalampos Kontoes,
Ioannis Papoutsis,
Vassilia Karathanassi

Affiliations

Vasileios Sitokonstantinou: Institute for Space Applications and Remote Sensing, National Observatory of Athens, I. Metaxa and Vas. Pavlou St, Penteli, 15236 Athens, Greece
Alkiviadis Koukos: Institute for Space Applications and Remote Sensing, National Observatory of Athens, I. Metaxa and Vas. Pavlou St, Penteli, 15236 Athens, Greece
Thanassis Drivas: Institute for Space Applications and Remote Sensing, National Observatory of Athens, I. Metaxa and Vas. Pavlou St, Penteli, 15236 Athens, Greece
Charalampos Kontoes: Institute for Space Applications and Remote Sensing, National Observatory of Athens, I. Metaxa and Vas. Pavlou St, Penteli, 15236 Athens, Greece
Ioannis Papoutsis: Institute for Space Applications and Remote Sensing, National Observatory of Athens, I. Metaxa and Vas. Pavlou St, Penteli, 15236 Athens, Greece
Vassilia Karathanassi: Laboratory of Remote Sensing, National Technical University of Athens, 9 Heroon Polytechniou Str., Zographos, 15790 Athens, Greece

DOI: https://doi.org/10.3390/rs13091769
Journal volume & issue: Vol. 13, no. 9
p. 1769

Abstract

Read online

The demand for rice production in Asia is expected to increase by 70% in the next 30 years, which makes evident the need for a balanced productivity and effective food security management at a national and continental level. Consequently, the timely and accurate mapping of paddy rice extent and its productivity assessment is of utmost significance. In turn, this requires continuous area monitoring and large scale mapping, at the parcel level, through the processing of big satellite data of high spatial resolution. This work designs and implements a paddy rice mapping pipeline in South Korea that is based on a time-series of Sentinel-1 and Sentinel-2 data for the year of 2018. There are two challenges that we address; the first one is the ability of our model to manage big satellite data and scale for a nationwide application. The second one is the algorithm’s capacity to cope with scarce labeled data to train supervised machine learning algorithms. Specifically, we implement an approach that combines unsupervised and supervised learning. First, we generate pseudo-labels for rice classification from a single site (Seosan-Dangjin) by using a dynamic k-means clustering approach. The pseudo-labels are then used to train a Random Forest (RF) classifier that is fine-tuned to generalize in two other sites (Haenam and Cheorwon). The optimized model was then tested against 40 labeled plots, evenly distributed across the country. The paddy rice mapping pipeline is scalable as it has been deployed in a High Performance Data Analytics (HPDA) environment using distributed implementations for both k-means and RF classifiers. When tested across the country, our model provided an overall accuracy of 96.69% and a kappa coefficient 0.87. Even more, the accurate paddy rice area mapping was returned early in the year (late July), which is key for timely decision-making. Finally, the performance of the generalized paddy rice classification model, when applied in the sites of Haenam and Cheorwon, was compared to the performance of two equivalent models that were trained with locally sampled labels. The results were comparable and highlighted the success of the model’s generalization and its applicability to other regions.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords