Plant Methods (Feb 2018)
Citizen crowds and experts: observer variability in image-based plant phenotyping
Abstract
Abstract Background Image-based plant phenotyping has become a powerful tool in unravelling genotype–environment interactions. The utilization of image analysis and machine learning have become paramount in extracting data stemming from phenotyping experiments. Yet we rely on observer (a human expert) input to perform the phenotyping process. We assume such input to be a ‘gold-standard’ and use it to evaluate software and algorithms and to train learning-based algorithms. However, we should consider whether any variability among experienced and non-experienced (including plain citizens) observers exists. Here we design a study that measures such variability in an annotation task of an integer-quantifiable phenotype: the leaf count. Results We compare several experienced and non-experienced observers in annotating leaf counts in images of Arabidopsis Thaliana to measure intra- and inter-observer variability in a controlled study using specially designed annotation tools but also citizens using a distributed citizen-powered web-based platform. In the controlled study observers counted leaves by looking at top-view images, which were taken with low and high resolution optics. We assessed whether the utilization of tools specifically designed for this task can help to reduce such variability. We found that the presence of tools helps to reduce intra-observer variability, and that although intra- and inter-observer variability is present it does not have any effect on longitudinal leaf count trend statistical assessments. We compared the variability of citizen provided annotations (from the web-based platform) and found that plain citizens can provide statistically accurate leaf counts. We also compared a recent machine-learning based leaf counting algorithm and found that while close in performance it is still not within inter-observer variability. Conclusions While expertise of the observer plays a role, if sufficient statistical power is present, a collection of non-experienced users and even citizens can be included in image-based phenotyping annotation tasks as long they are suitably designed. We hope with these findings that we can re-evaluate the expectations that we have from automated algorithms: as long as they perform within observer variability they can be considered a suitable alternative. In addition, we hope to invigorate an interest in introducing suitably designed tasks on citizen powered platforms not only to obtain useful information (for research) but to help engage the public in this societal important problem.
Keywords