Mathematical Biosciences and Engineering (May 2021)

Semi-supervised random forest regression model based on co-training and grouping with information entropy for evaluation of depression symptoms severity

  • Shengfu Lu,
  • Xin Shi ,
  • Mi Li,
  • Jinan Jiao,
  • Lei Feng,
  • Gang Wang

DOI
https://doi.org/10.3934/mbe.2021233
Journal volume & issue
Vol. 18, no. 4
pp. 4586 – 4602

Abstract

Read online

Semi-supervised learning has always been a hot topic in machine learning. It uses a large number of unlabeled data to improve the performance of the model. This paper combines the co-training strategy and random forest to propose a novel semi-supervised regression algorithm: semi-supervised random forest regression model based on co-training and grouping with information entropy (E-CoGRF), and applies it to the evaluation of depression symptoms severity. The algorithm inherits the ensemble characteristics of random forest, and combines well with co-training. In order to balance the accuracy and diversity of co-training random forests, the algorithm proposes a grouping strategy to decision trees. Moreover, the information entropy is used to measure the confidence, which avoids unnecessary repeated training and improves the efficiency of the model. In the practical application of evaluation of depression symptoms severity, we collect cognitive behavioral data of emotional conflict based on the depressive affective disorder. And on this basis, feature construction and normalization preprocessing are carried out. Finally, the test is conducted on 35 labeled and 80 unlabeled depression patients. The result shows that the proposed algorithm obtains MAE (Mean Absolute Error) = 3.63 and RMSE (Root Mean Squared Error) = 4.50, which is better than other semi-supervised regression algorithms. The proposed method effectively solves the modeling difficulties caused by insufficient labeled samples, and has important reference value for the diagnosis of depression symptoms severity.

Keywords