Development of Machine Learning Methods for Accurate Prediction of Plant Disease Resistance
Qi Liu,
Shi-min Zuo,
Shasha Peng,
Hao Zhang,
Ye Peng,
Wei Li,
Yehui Xiong,
Runmao Lin,
Zhiming Feng,
Huihui Li,
Jun Yang,
Guo-Liang Wang,
Houxiang Kang
Affiliations
Qi Liu
State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China
Shi-min Zuo
Zhongshan Biological Breeding Laboratory & Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding, Agricultural College of Yangzhou University, Yangzhou 225009, China
Shasha Peng
State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China; Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization and College of Agronomy, Hunan Agricultural University, Changsha 410128, China
Hao Zhang
State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
Ye Peng
State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China
Wei Li
College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
Yehui Xiong
State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
Runmao Lin
Key Laboratory of Green Prevention and Control of Tropical Plant Diseases and Pests Ministry of Education, College of Plant Protection, Hainan University, Haikou 570228, China
Zhiming Feng
Zhongshan Biological Breeding Laboratory & Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding, Agricultural College of Yangzhou University, Yangzhou 225009, China
Huihui Li
State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
Jun Yang
MARA Key Laboratory of Surveillance and Management for Plant Quarantine Pests, Department of Plant Biosecurity, College of Plant Protection, China Agricultural University, Beijing 100193, China
Guo-Liang Wang
Department of Plant Pathology, Ohio State University, Columbus, OH 43210, USA
Houxiang Kang
State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China; Corresponding author.
The traditional method of screening plants for disease resistance phenotype is both time-consuming and costly. Genomic selection offers a potential solution to improve efficiency, but accurately predicting plant disease resistance remains a challenge. In this study, we evaluated eight different machine learning (ML) methods, including random forest classification (RFC), support vector classifier (SVC), light gradient boosting machine (lightGBM), random forest classification plus kinship (RFC_K), support vector classification plus kinship (SVC_K), light gradient boosting machine plus kinship (lightGBM_K), deep neural network genomic prediction (DNNGP), and densely connected convolutional networks (DenseNet), for predicting plant disease resistance. Our results demonstrate that the three plus kinship (K) methods developed in this study achieved high prediction accuracy. Specifically, these methods achieved accuracies of up to 95% for rice blast (RB), 85% for rice black-streaked dwarf virus (RBSDV), and 85% for rice sheath blight (RSB) when trained and applied to the rice diversity panel I (RDPI). Furthermore, the plus K models performed well in predicting wheat blast (WB) and wheat stripe rust (WSR) diseases, with mean accuracies of up to 90% and 93%, respectively. To assess the generalizability of our models, we applied the trained plus K methods to predict RB disease resistance in an independent population, rice diversity panel II (RDPII). Concurrently, we evaluated the RB resistance of RDPII cultivars using spray inoculation. Comparing the predictions with the spray inoculation results, we found that the accuracy of the plus K methods reached 91%. These findings highlight the effectiveness of the plus K methods (RFC_K, SVC_K, and lightGBM_K) in accurately predicting plant disease resistance for RB, RBSDV, RSB, WB, and WSR. The methods developed in this study not only provide valuable strategies for predicting disease resistance, but also pave the way for using machine learning to streamline genome-based crop breeding.