Big Data and Cognitive Computing (Apr 2025)
Development of a Predictive Model for the Biological Activity of Food and Microbial Metabolites Toward Estrogen Receptor Alpha (ERα) Using Machine Learning
Abstract
The interaction of estrogen receptor alpha (ERα) with various metabolites—both endogenous and exogenous, such as those present in food products, as well as gut microbiota-derived metabolites—plays a critical role in modulating the hormonal balance in the human body. In this study, we evaluated a suite of 27 machine learning models and, following systematic optimization and rigorous performance comparison, identified linear discriminant analysis (LDA) as the most effective predictive approach. A meticulously curated dataset comprising 75 molecular descriptors derived from compounds with known ERα activity was assembled, enabling the model to achieve an accuracy of 89.4% and an F1 score of 0.93, thereby demonstrating high predictive efficacy. Feature importance analysis revealed that both topological and physicochemical descriptors—most notably FractionCSP3 and AromaticProportion—play pivotal roles in the potential binding to ERα. Subsequently, the model was applied to chemicals commonly encountered in food products, such as indole and various phenolic compounds, indicating that approximately 70% of these substances exhibit activity toward ERα. Moreover, our findings suggest that food processing conditions, including fermentation, thermal treatment, and storage parameters, can significantly influence the formation of these active metabolites. These results underscore the promising potential of integrating predictive modeling into food technology and highlight the need for further experimental validation and model refinement to support innovative strategies for developing healthier and more sustainable food products.
Keywords