IEEE Access (Jan 2020)
Collaborative Deep Learning Models to Handle Class Imbalance in FlowCam Plankton Imagery
Abstract
Using automated imaging technologies, it is now possible to generate previously unprecedented volumes of plankton image data which can be used to study the composition of plankton assemblages. However, the current need to manually classify individual images introduces a bottleneck into processing chains. Although Machine Learning techniques have been used to try and address this issue, past efforts have suffered from accuracy limitations, especially in minority classes. Here we use state-of-the-art methods in Deep Learning to investigate suitable architectures for training an automated plankton classification system which achieves high efficacy for both abundant and rare taxa. We collected live plankton from Station L4 in the Western English Channel and imaged 11,371 particles covering 104 taxonomic groups using the automated plankton imaging system FlowCam. The image set contained a severe class imbalance, with some taxa represented by > 600 images while other, rarer taxa were represented by just 14. We demonstrate that by allowing multiple Deep Learning models to collaborate in a single classification system, classification accuracy improves for minority classes when compared with the best individual model. The top collaborative model achieved a 6 % improvement in F1 accuracy over the best individual model, while overall accuracy improved by 3.2 %. This resulted in a 97.4 % overall accuracy score and a 96.2 % F1 macro score on a separate holdout test set containing 104 taxonomic groups. Based on a survey of similar studies in the literature, we believe collaborative deep learning models can significantly improve the accuracy of existing automated plankton classification systems.
Keywords