BMC Medical Research Methodology (Nov 2018)
Retinal vascular tortuosity assessment: inter-intra expert analysis and correlation with computational measurements
Abstract
Abstract Background The retinal vascular tortuosity can be a potential indicator of relevant vascular and non-vascular diseases. However, the lack of a precise and standard guide for the tortuosity evaluation hinders its use for diagnostic and treatment purposes. This work aims to advance in the standardization of the retinal vascular tortuosity as a clinical biomarker with diagnostic potential, allowing, thereby, the validation of objective computational measurements on the basis of the entire spectrum of the expert knowledge. Methods This paper describes a multi-expert validation process of the computational vascular tortuosity measurements of reference. A group of five experts, covering the different clinical profiles of an ophthalmological service, and a four-grade scale from non-tortuous to severe tortuosity as well as non-tortuous / tortuous and asymptomatic / symptomatic binary classifications are considered for the analysis of the the multi-expert validation procedure. The specialists rating process comprises two rounds involving all the experts and a joint round to establish consensual rates. The expert agreement is analyzed throughout the rating procedure and, then, the consensual rates are set as the reference to validate the prognostic performance of four computational tortuosity metrics of reference. Results The Kappa indexes for the intra-rater agreement analysis were obtained between 0.35 and 0.83 whereas for the inter-rater agreement in the asymptomatic / symptomatic classification were between 0.22 and 0.76. The Area Under the Curve (AUC) for each expert against the consensual rates were placed between 0.61 and 0.83 whereas the prognostic performance of the best objective tortuosity metric was 0.80. Conclusions There is a high inter and intra-rater variability, especially for the case of the four grade scale. The prognostic performance of the tortuosity measurements is close to the experts’ performance, especially for Grisan measurement. However, there is a gap between the automatic effectiveness and the expert perception given the lack of clinical criteria in the computational measurements.
Keywords