IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2020)
A CNN-Transformer Hybrid Approach for Crop Classification Using Multitemporal Multisensor Images
Abstract
Multitemporal Earth observation capability plays an increasingly important role in crop monitoring. As the frequency of satellite acquisition of remote sensing images becomes higher, how to fully exploit the implicit phenological laws in dense multitemporal data is of increasing importance. In this article, we propose a CNN-transformer approach to perform the crop classification, in the model, we borrow the transformer architecture from the knowledge of NLP to dig into the pattern of multitemporal sequence. First, after unifying the spatial-spectral scale of each multiband data acquired from different sensors, we obtain the scale-consistent feature and position feature of multitemporal sequence. Second, with adopting multilayer encoder modules derived from the transformer, we mine deep correlation patterns of multitemporal sequence. Finally, the feed-forward layer and softmax layer serve as output layers of the model to predict crop categories. The proposed CNN-transformer approach is illustrated in a crop-rich agricultural region in central California, where 65 multitemporal profiles from multisensor Sentinel-2 A/B and Landsat-8 are obtained in 2018. Through multiband multiresolution fusion, sequence correlation extraction of multitemporal data and category feature extraction, the classification results show that the proposed method has a significant performance improvement compared with other traditional methods.
Keywords