Journal of Translational Medicine (Jun 2019)

Discriminant analysis and machine learning approach for evaluating and improving the performance of immunohistochemical algorithms for COO classification of DLBCL

  • Yocanxóchitl Perfecto-Avalos,
  • Alejandro Garcia-Gonzalez,
  • Ana Hernandez-Reynoso,
  • Gildardo Sánchez-Ante,
  • Carlos Ortiz-Hidalgo,
  • Sean-Patrick Scott,
  • Rita Q. Fuentes-Aguilar,
  • Ricardo Diaz-Dominguez,
  • Grettel León-Martínez,
  • Verónica Velasco-Vales,
  • Mara A. Cárdenas-Escudero,
  • José A. Hernández-Hernández,
  • Arturo Santos,
  • José R. Borbolla-Escoboza,
  • Luis Villela

DOI
https://doi.org/10.1186/s12967-019-1951-y
Journal volume & issue
Vol. 17, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background Diffuse large B-cell lymphoma (DLBCL) is classified into germinal center-like (GCB) and non-germinal center-like (non-GCB) cell-of-origin groups, entities driven by different oncogenic pathways with different clinical outcomes. DLBCL classification by immunohistochemistry (IHC)-based decision tree algorithms is a simpler reported technique than gene expression profiling (GEP). There is a significant discrepancy between IHC-decision tree algorithms when they are compared to GEP. Methods To address these inconsistencies, we applied the machine learning approach considering the same combinations of antibodies as in IHC-decision tree algorithms. Immunohistochemistry data from a public DLBCL database was used to perform comparisons among IHC-decision tree algorithms, and the machine learning structures based on Bayesian, Bayesian simple, Naïve Bayesian, artificial neural networks, and support vector machine to show the best diagnostic model. We implemented the linear discriminant analysis over the complete database, detecting a higher influence of BCL6 antibody for GCB classification and MUM1 for non-GCB classification. Results The classifier with the highest metrics was the four antibody-based Perfecto–Villela (PV) algorithm with 0.94 accuracy, 0.93 specificity, and 0.95 sensitivity, with a perfect agreement with GEP (κ = 0.88, P < 0.001). After training, a sample of 49 Mexican-mestizo DLBCL patient data was classified by COO for the first time in a testing trial. Conclusions Harnessing all the available immunohistochemical data without reliance on the order of examination or cut-off value, we conclude that our PV machine learning algorithm outperforms Hans and other IHC-decision tree algorithms currently in use and represents an affordable and time-saving alternative for DLBCL cell-of-origin identification.

Keywords