Guangdong nongye kexue (Sep 2023)

Research on SSR Molecular Markers for Traceability of Tea Origins Based on Deep Neural Network

  • Hao GONG,
  • Lili ZHANG,
  • Furong CHEN,
  • Lixia LIN,
  • Yijun CHEN,
  • Le ZHANG,
  • Chunlian SUN,
  • Jian SUN

DOI
https://doi.org/10.16768/j.issn.1004-874X.2023.09.011
Journal volume & issue
Vol. 50, no. 9
pp. 108 – 116

Abstract

Read online

【Objective】The study was conducted to differentiate and trace the origin of different varieties of tea, and provide a reference basis for the classification of other plants.【Method】The sources genus of 313 tea samples from Hunan, Yunnan, Fujian and Zhejiang Provinces and 10 outgroup relationships were investigated by utilizing SSR-based and bioinformatics research methods. First, 54 SSR loci of high quality were screened and the degree of variation among tea samples from different provinces were analyzed by Principal Component Analysis(PCA) and constructing an evolutionary tree. Second, the classification accuracy of three models including the Linear Regression Model, the Random Forest Model, and the Deep Neural Networks Model(DNN) were compared and the Neural Networks Model with the highest accuracy were selected for constructing and optimizing the traceability model.【Result】The sample individuals showed relative aggregation within the four provinces, in which the sample individuals within Yunnan Province differed significantly compared with those in other provinces; while the samples from Fujian, Zhejiang and Hunan showed separated aggregation, indicating that there were significant differences in tea among Fujian, Zhejiang and Hunan Provinces, but there was a small amount of crossover, with some similar genetic structure characteristics, and that the individuals from these three provinces were more closely related. By using three different models to construct a model for the molecular marker matrix of 54 SSR markers, we initially identified that the accuracy of the Linear Regression Model was 81%, that of the Random Forest Model was 77%, and while the accuracy of DNN Model was the highest of 86%. Consequently, it could be inferred that the DNN Model was optimal for classifying tea trees. Subsequently, a prediction model was constructed with 54 SSR markers and 323 samples. The batch size, step size, number of layers in the hidden layer, and number of nodes in each layer of each training were optimized. It was found that the highest accuracy of approximately 95% for validation and test sets was achieved when the batch size was 150, the step size was 20 000 and the number of layers in the hidden layer was 2. Therefore, a 2-layer neural network was optimal for the analysis of tea.【Conclusion】DNN-Based SSR molecular markers provide a strong foundation for researches on tea classification, origin traceability, and tea breeding. The constructed classification model can also be used for identifying the origin of resequencing data for other species.

Keywords