PLoS ONE (Jan 2019)
Age grading An. gambiae and An. arabiensis using near infrared spectra and artificial neural networks.
Abstract
BackgroundNear infrared spectroscopy (NIRS) is currently complementing techniques to age-grade mosquitoes. NIRS classifies lab-reared and semi-field raised mosquitoes into Methods and findingsWe explore whether using an artificial neural network (ANN) analysis instead of PLS regression improves the current accuracy of NIRS models for age-grading malaria transmitting mosquitoes. We also explore if directly training a binary classifier instead of training a regression model and interpreting it as a binary classifier improves the accuracy. A total of 786 and 870 NIR spectra collected from laboratory reared An. gambiae and An. arabiensis, respectively, were used and pre-processed according to previously published protocols. The ANN regression model scored root mean squared error (RMSE) of 1.6 ± 0.2 for An. gambiae and 2.8 ± 0.2 for An. arabiensis; whereas the PLS regression model scored RMSE of 3.7 ± 0.2 for An. gambiae, and 4.5 ± 0.1 for An. arabiensis. When we interpreted regression models as binary classifiers, the accuracy of the ANN regression model was 93.7 ± 1.0% for An. gambiae, and 90.2 ± 1.7% for An. arabiensis; while PLS regression model scored the accuracy of 83.9 ± 2.3% for An. gambiae, and 80.3 ± 2.1% for An. arabiensis. We also find that a directly trained binary classifier yields higher age estimation accuracy than a regression model interpreted as a binary classifier. A directly trained ANN binary classifier scored an accuracy of 99.4 ± 1.0 for An. gambiae and 99.0 ± 0.6% for An. arabiensis; while a directly trained PLS binary classifier scored 93.6 ± 1.2% for An. gambiae and 88.7 ± 1.1% for An. arabiensis. We further tested the reproducibility of these results on different independent mosquito datasets. ANNs scored higher estimation accuracies than when the same age models are trained using PLS. Regardless of the model architecture, directly trained binary classifiers scored higher accuracies on classifying age of mosquitoes than regression models translated as binary classifiers.ConclusionWe recommend training models to estimate age of An. arabiensis and An. gambiae using ANN model architectures (especially for datasets with at least 70 mosquitoes per age group) and direct training of binary classifier instead of training a regression model and interpreting it as a binary classifier.