Pesquisa Agropecuária Brasileira (Sep 2024)
Performances of several machine learning algorithms and of logistic regression to predict Fasciola hepática in cattle
Abstract
Abstract The objective of this work was to compare the performances of logistic regression and machine learning algorithms to predict infection caused by Fasciola hepatica in cattle. A dataset on 30,151 bovines from Uruguay was used. Logistic regression (LR) and the algorithms k-nearest neighbor (KNN), classification and regression trees (CART), and random forest (RF) were compared. The interquartile range (IQR) and z-score were used to improve the classification and compared to each another. Sex, age, carcass conformation score, fat score, productive purpose, and carcass weight were used as independent variables for all algorithms. Infection by F. hepática was used as a binary dependent variable. The accuracies of LR, KNN, CART, and RF were 0.61, 0.57, 0.57, and 0.58, respectively. The variable importance of LR showed that adult cattle tended to be infected by F. hepatica. All models showed low accuracy, but LR successfully distinguished variables related to F. hepatica. Both the IQR and z-score show similar results in improving the classification metrics for the used dataset. In the dataset, data related to climate or factors such as body weight can improve the reliability of the model in future studies.
Keywords