PeerJ (Oct 2018)

Clustering of fMRI data: the elusive optimal number of clusters

  • Mohamed L. Seghier

DOI
https://doi.org/10.7717/peerj.5416
Journal volume & issue
Vol. 6
p. e5416

Abstract

Read online Read online

Model-free methods are widely used for the processing of brain fMRI data collected under natural stimulations, sleep, or rest. Among them is the popular fuzzy c-mean algorithm, commonly combined with cluster validity (CV) indices to identify the ‘true’ number of clusters (components), in an unsupervised way. CV indices may however reveal different optimal c-partitions for the same fMRI data, and their effectiveness can be hindered by the high data dimensionality, the limited signal-to-noise ratio, the small proportion of relevant voxels, and the presence of artefacts or outliers. Here, the author investigated the behaviour of seven robust CV indices. A new CV index that incorporates both compactness and separation measures is also introduced. Using both artificial and real fMRI data, the findings highlight the importance of looking at the behavior of different compactness and separation measures, defined here as building blocks of CV indices, to depict a full description of the data structure, in particular when no agreement is found between CV indices. Overall, for fMRI, it makes sense to relax the assumption that only one unique c-partition exists, and appreciate that different c-partitions (with different optimal numbers of clusters) can be useful explanations of the data, given the hierarchical organization of many brain networks.

Keywords