Foot & Ankle Orthopaedics (Oct 2020)

Evaluation of a Weightbearing CT Artificial Intelligence-based Automatic Measurement for Hallux Valgus: A Case-Control Study

  • Jonathan Day MS,
  • Francois Lintz MD,
  • Martinus Richter MD, PhD,
  • Céline Fernando,
  • Scott J. Ellis MD,
  • Jonathan T. Deland MD,
  • Cesar de Cesar Netto MD, PhD

DOI
https://doi.org/10.1177/2473011420S00033
Journal volume & issue
Vol. 5

Abstract

Read online

Category: Bunion; Other Introduction/Purpose: Cone Beam Weight Bearing CT (WBCT) is gaining traction, particularly in the foot and ankle, due to the ability to perform natural stance weight bearing 3D scans. However, the resulting wealth of 3D data renders daily clinical use time consuming. Therefore, reliable automatic measurements are indispensable in order to make best use of the technology. The aim of this study was to evaluate a beta-version WBCT artificial intelligence (AI) automatic measurement system for the M1-M2 intermetatarsal angle (IMA), which is applicable in the absence of metallic hardware in the foot and ankle. We hypothesized that automatic measurements would correlate well with human measurements, and that software reproducibility would be better and close to perfect compared to manual measurements. Methods: In this retrospective case-control study, 90 feet were included from patients who underwent WBCT scans during routine follow up: 44 feet (90.9% female, mean age 54 years) with symptomatic hallux valgus (HV), 46 controls (76.1% female, mean age 49 years). Patients were excluded if they had history of surgery or trauma involving the first or second metatarsals, hallux rigidus, or presence of metal in their foot/ankle. IMA was measured manually on Digitally Reconstructed Radiographs (DRR IMA) and automatically with AI software producing auto 2D (ground plane projection) and 3D (multiplanar) measurements. Each IMA DRR was measured by two independent raters twice to calculate intraclass correlation coefficients (ICCs). To assess intra- software reliability, AI software measurements were made twice on each dataset. Manual and automatic measurements were compared between HV and control groups. Failures of the AI software to produce a measurement were recorded. Results: Mean values for controls were 8.6° +-1.8° (range, 5°-14°) for the manually measured DRR IMA, 9.3° +-2.8° (range, 3°- 17°) for auto 2D, and 9.2° +-2.6° (range, 3°-16°) for auto 3D IMA measurements. Compared to controls, HV patients demonstrated significantly increased IMA (p<0.0001): 14.2° +-2.7° (range, 8°-21°) for the manually measured DRR IMA, 15.4°+- 4.4° (range, 8°-26°) for auto 2D, and 15.1° +-4.1° (range 8°-28°) for auto 3D IMA measurements. There were strong correlations (r=0.75 and r=0.80) between manual and auto 2D and 3D measurements. Intraobserver and interobserver ICCs for DRR IMA were 0.96 and 0.90, respectively, and the intra-software ICCs for the AI were near 1.0 for both auto 2D and auto 3D IMA. The AI software failed in 32.3% cases. Conclusion: Our results demonstrated strong correlation between a WBCT Artificial Intelligence based automatic measurement for IMA with human measurements, with the ability to distinguish HV from control with close to 100% repeatability. However, the number of failures was still high due to the early stage beta-version of the algorithm tested. While these early results are promising, further developments are warranted in order to improve usability of this tool in daily practice, especially in the presence of metal hardware. Figure 1.