Journal of Cheminformatics (Nov 2022)
Development of machine learning classifiers to predict compound activity on prostate cancer cell lines
Abstract
Abstract Prostate cancer is the most common type of cancer in men. The disease presents good survival rates if treated at the early stages. However, the evolution of the disease in its most aggressive variant remains without effective therapeutic answers. Therefore, the identification of novel effective therapeutics is urgently needed. On these premises, we developed a series of machine learning models, based on compounds with reported highly homogeneous cell-based antiproliferative assay data, able to predict the activity of ligands towards the PC-3 and DU-145 prostate cancer cell lines. The data employed in the development of the computational models was finely-tuned according to a series of thresholds for the classification of active/inactive compounds, to the number of features to be implemented, and by using 10 different machine learning algorithms. Models’ evaluation allowed us to identify the best combination of activity thresholds and ML algorithms for the classification of active compounds, achieving prediction performances with MCC values above 0.60 for PC-3 and DU-145 cells. Moreover, in silico models based on the combination of PC-3 and DU-145 data were also developed, demonstrating excellent precision performances. Finally, an analysis of the activity annotations reported for the ligands in the curated datasets were conducted, suggesting associations between cellular activity and biological targets that might be explored in the future for the design of more effective prostate cancer antiproliferative agents.
Keywords