Heliyon (Jan 2024)
Developing a robust two-step machine learning multiclassification pipeline to predict primary site in head and neck carcinoma from lymph nodes
Abstract
This study aimed to develop a robust multiclassification pipeline to determine the primary tumor location in patients with head and neck carcinoma of unknown primary using radiomics and machine learning techniques. The dataset included 400 head and neck cancer patients with primary tumor in oropharynx, OPC (n = 162), nasopharynx, NPC (n = 137), oral cavity, OC (n = 63), larynx and hypopharynx, HL (n = 38). Two radiomic-based multiclassification pipelines (P1 and P2) were developed. P1 consisted in a direct identification of the primary sites, whereas P2 was based on a two-step approach: in the first step, the number of classes was reduced by merging the two minority classes which were reclassified in the second step. Diverse correlation thresholds (0.75, 0.80, 0.85), feature selection methods (sequential forwards/backwards selection, sequential floating forward selection, neighborhood component analysis and minimum redundancy maximum relevance), and classification models (neural network, decision tree, naïve Bayes, bagged trees and support vector machine) were assessed. P2 outperformed P1, with the best results obtained with the support vector machine classifier including radiomic and clinical features (accuracies of 75.3 % (HL), 75.4 % (OC), 71.3 % (OPC), 92.9 % (NPC)). These results indicate that the two-step multiclassification pipeline integrating radiomics and clinical information is a promising approach to predict the tumor site of unknown primary.