Automatika (Jan 2021)

Classification and feature analysis of the Human Connectome Project dataset for differentiating between males and females

  • Jelena Bozek,
  • Ivan Kesedzic,
  • Leonard Novosel,
  • Tomislav Bozek

DOI
https://doi.org/10.1080/00051144.2021.1885890
Journal volume & issue
Vol. 62, no. 1
pp. 109 – 117

Abstract

Read online

We analysed features relevant for differentiation between males and females based on the data available from the Human Connectome Project (HCP) S1200 dataset. We used 354 features containing cognitive and emotional measures as well as measures derived from task functional magnetic resonance imaging (MRI) and structural brain MRI. The paper presents a thorough analysis of this extensive set of features using a machine learning approach with a goal of identifying features that have the ability to differentiate between males and females. We used two state of the art classification algorithms with different properties: support vector machine (SVM) and random forest classifier (RFC). For each classifier the hyperparameters were obtained and classifiers were optimized using nested cross validation and grid search. This resulted in the classification performance of 91% and 89% accuracy using SVM and RFC, respectively. Using SHAP (SHapley Additive exPlanations) method we obtained relevance of features as indicators of sex differences and identified features with high discriminative power for sex classification. The majority of top features were brain morphological measures, and only a small proportion were features related to cognitive performance. Our results demonstrate the importance and advantages of using a machine learning approach when analysing sex differences.

Keywords