An Effective Integrated Machine Learning Framework for Identifying Severity of Tomato Yellow Leaf Curl Virus and Their Experimental Validation
Nattanong Bupi,
Vinoth Kumar Sangaraju,
Le Thi Phan,
Aamir Lal,
Thuy Thi Bich Vo,
Phuong Thi Ho,
Muhammad Amir Qureshi,
Marjia Tabassum,
Sukchan Lee,
Balachandran Manavalan
Affiliations
Nattanong Bupi
Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
Vinoth Kumar Sangaraju
Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
Le Thi Phan
Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
Aamir Lal
Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
Thuy Thi Bich Vo
Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
Phuong Thi Ho
Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
Muhammad Amir Qureshi
Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
Marjia Tabassum
Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
Sukchan Lee
Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
Balachandran Manavalan
Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
Tomato yellow leaf curl virus (TYLCV) dispersed across different countries, specifically to subtropical regions, associated with more severe symptoms. Since TYLCV was first isolated in 1931, it has been a menace to tomato industrial production worldwide over the past century. Three groups were newly isolated from TYLCV-resistant tomatoes in 2022; however, their functions are unknown. The development of machine learning (ML)-based models using characterized sequences and evaluating blind predictions is one of the major challenges in interdisciplinary research. The purpose of this study was to develop an integrated computational framework for the accurate identification of symptoms (mild or severe) based on TYLCV sequences (isolated in Korea). For the development of the framework, we first extracted 11 different feature encodings and hybrid features from the training data and then explored 8 different classifiers and developed their respective prediction models by using randomized 10-fold cross-validation. Subsequently, we carried out a systematic evaluation of these 96 developed models and selected the top 90 models, whose predicted class labels were combined and considered as reduced features. On the basis of these features, a multilayer perceptron was applied and developed the final prediction model (IML-TYLCVs). We conducted blind prediction on 3 groups using IML-TYLCVs, and the results indicated that 2 groups were severe and 1 group was mild. Furthermore, we confirmed the prediction with virus-challenging experiments of tomato plant phenotypes using infectious clones from 3 groups. Plant virologists and plant breeding professionals can access the user-friendly online IML-TYLCVs web server at https://balalab-skku.org/IML-TYLCVs, which can guide them in developing new protection strategies for newly emerging viruses.