Semi-supervised random forest regression model based on co-training and grouping with information entropy for evaluation of depression symptoms severity

Shengfu Lu; Xin Shi; Mi Li; Jinan Jiao; Lei Feng; Gang Wang

doi:10.3934/mbe.2021233

Mathematical Biosciences and Engineering (May 2021)

Semi-supervised random forest regression model based on co-training and grouping with information entropy for evaluation of depression symptoms severity

Shengfu Lu,
Xin Shi ,
Mi Li,
Jinan Jiao,
Lei Feng,
Gang Wang

Affiliations

Shengfu Lu: 1. Department of Automation, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2. The Beijing International Collaboration Base on Brain Informatics and Wisdom Services, Beijing 100124, China 3. Engineering Research Center of Intelligent Perception and Autonomous Control, Ministry of Education, Beijing 100124, China
Xin Shi: 1. Department of Automation, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2. The Beijing International Collaboration Base on Brain Informatics and Wisdom Services, Beijing 100124, China 3. Engineering Research Center of Intelligent Perception and Autonomous Control, Ministry of Education, Beijing 100124, China
Mi Li: 1. Department of Automation, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2. The Beijing International Collaboration Base on Brain Informatics and Wisdom Services, Beijing 100124, China 3. Engineering Research Center of Intelligent Perception and Autonomous Control, Ministry of Education, Beijing 100124, China 4. Engineering Research Center of Digital Community, Ministry of Education, Beijing 100124, China
Jinan Jiao: 1. Department of Automation, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2. The Beijing International Collaboration Base on Brain Informatics and Wisdom Services, Beijing 100124, China 3. Engineering Research Center of Intelligent Perception and Autonomous Control, Ministry of Education, Beijing 100124, China
Lei Feng: 5. The National Clinical Research Center for Mental Disorders & Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing 100088, China 6. The Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing 100088, China
Gang Wang: 5. The National Clinical Research Center for Mental Disorders & Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing 100088, China 6. The Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing 100088, China

DOI: https://doi.org/10.3934/mbe.2021233
Journal volume & issue: Vol. 18, no. 4
pp. 4586 – 4602

Abstract

Read online

Semi-supervised learning has always been a hot topic in machine learning. It uses a large number of unlabeled data to improve the performance of the model. This paper combines the co-training strategy and random forest to propose a novel semi-supervised regression algorithm: semi-supervised random forest regression model based on co-training and grouping with information entropy (E-CoGRF), and applies it to the evaluation of depression symptoms severity. The algorithm inherits the ensemble characteristics of random forest, and combines well with co-training. In order to balance the accuracy and diversity of co-training random forests, the algorithm proposes a grouping strategy to decision trees. Moreover, the information entropy is used to measure the confidence, which avoids unnecessary repeated training and improves the efficiency of the model. In the practical application of evaluation of depression symptoms severity, we collect cognitive behavioral data of emotional conflict based on the depressive affective disorder. And on this basis, feature construction and normalization preprocessing are carried out. Finally, the test is conducted on 35 labeled and 80 unlabeled depression patients. The result shows that the proposed algorithm obtains MAE (Mean Absolute Error) = 3.63 and RMSE (Root Mean Squared Error) = 4.50, which is better than other semi-supervised regression algorithms. The proposed method effectively solves the modeling difficulties caused by insufficient labeled samples, and has important reference value for the diagnosis of depression symptoms severity.

Published in Mathematical Biosciences and Engineering

ISSN: 1551-0018 (Online)
Publisher: AIMS Press
Country of publisher: United States
LCC subjects: Technology: Chemical technology: Biotechnology; Science: Mathematics
Website: https://www.aimspress.com/journal/MBE

About the journal

Abstract

Keywords