Semantic Feature Extraction Using SBERT for Dementia Detection

Yamanki Santander-Cruz; Sebastián Salazar-Colores; Wilfrido Jacobo Paredes-García; Humberto Guendulain-Arenas; Saúl Tovar-Arriaga

doi:10.3390/brainsci12020270

Brain Sciences (Feb 2022)

Semantic Feature Extraction Using SBERT for Dementia Detection

Yamanki Santander-Cruz,
Sebastián Salazar-Colores,
Wilfrido Jacobo Paredes-García,
Humberto Guendulain-Arenas,
Saúl Tovar-Arriaga

Affiliations

Yamanki Santander-Cruz: Facultad de Ingeniería, Universidad Autónoma de Querétaro, Queretaro C.P. 76010, Mexico
Sebastián Salazar-Colores: Centro de Investigaciones en Óptica, Leon C.P. 37150, Mexico
Wilfrido Jacobo Paredes-García: Facultad de Ingeniería, Universidad Autónoma de Querétaro, Queretaro C.P. 76010, Mexico
Humberto Guendulain-Arenas: Departamento de Geriatría, Instituto Mexicano del Seguro Social, San Juan del Rio C.P. 76800, Mexico
Saúl Tovar-Arriaga: Facultad de Ingeniería, Universidad Autónoma de Querétaro, Queretaro C.P. 76010, Mexico

DOI: https://doi.org/10.3390/brainsci12020270
Journal volume & issue: Vol. 12, no. 2
p. 270

Abstract

Read online

Dementia is a neurodegenerative disease that leads to the development of cognitive deficits, such as aphasia, apraxia, and agnosia. It is currently considered one of the most significant major medical problems worldwide, primarily affecting the elderly. This condition gradually impairs the patient’s cognition, eventually leading to the inability to perform everyday tasks without assistance. Since dementia is an incurable disease, early detection plays an important role in delaying its progression. Because of this, tools and methods have been developed to help accurately diagnose patients in their early stages. State-of-the-art methods have shown that the use of syntactic-type linguistic features provides a sensitive and noninvasive tool for detecting dementia in its early stages. However, these methods lack relevant semantic information. In this work, we propose a novel methodology, based on the semantic features approach, by using sentence embeddings computed by Siamese BERT networks (SBERT), along with support vector machine (SVM), K-nearest neighbors (KNN), random forest, and an artificial neural network (ANN) as classifiers. Our methodology extracted 17 features that provide demographic, lexical, syntactic, and semantic information from 550 oral production samples of elderly controls and people with Alzheimer’s disease, provided by the DementiaBank Pitt Corpus database. To quantify the relevance of the extracted features for the dementia classification task, we calculated the mutual information score, which demonstrates a dependence between our features and the MMSE score. The experimental classification performance metrics, such as the accuracy, precision, recall, and F1 score (77, 80, 80, and 80%, respectively), validate that our methodology performs better than syntax-based methods and the BERT approach when only the linguistic features are used.

Published in Brain Sciences

ISSN: 2076-3425 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: https://www.mdpi.com/journal/brainsci/

About the journal

Abstract

Keywords