IEEE Access (Jan 2024)
Children’s Sentiment Analysis From Texts by Using Weight Updated Tuned With Random Forest Classification
Abstract
Sentimental Analysis is considered a computational strategy that helps in identifying and assessing the emotions of people via text documents. Tools and different methods have been adopted for determining both positive and negative emotions in the form of text data analytics by using Machine and Deep Learning techniques. Experimentally, it has been shown that the accuracy of existing text classification models such as Bi-LSTM, Decision Tree, and Ensemble Classifiers is limited by poor quality data, inappropriate hyperparameter tuning, and model-specific bias levels. Additionally, these models are prone to overfitting, high computational overhead, and longer training time. To overcome these limitations, we proposed a hybrid binary classification framework by combining Deep sequential features with the Random Forest (RF) technique. The approach is implemented in four phases: Initially, data preprocessing is performed by employing a Vader sentiment package. In the second step, the deep Long Short Term Memory (LSTM) model was employed to extract deep sequential features corresponding to sad and happy emotions. In the third phase, a bi-orthogonalization algorithm with principal component Analysis (PCA) and Singular Value Decomposition (SVD) was employed to minimize the redundancy and maximize the relevance of extracted features. Finally, a five-fold cross-validation technique was implemented to discriminate sad and happy emotions using the Random Forest (RF) algorithm. Eventually, a grid search approach was implemented for hyperparameter tuning and results were compared with five baseline algorithms (Vanilla LSTM (VLSTM), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Naïve Bayes (NB), Ada Boost Algorithm (ABA). The experimental outcomes revealed that the proposed model achieved an accuracy rate of 99.631% on the 4000 stories dataset which was superior to all five state-of-the-art methods with a margin of 4.63%, 10.7%, 19.44%, 21%, and 56.5%, respectively. Interestingly, the proposed model realized improved results in terms of other conventional performance metrics also such as precision, recall, specificity, and time complexity. Overall, the proposed model has great potential in educational institutions, child psychology research, and child-friendly content moderation, generally helping in the understanding of the emotions and experiences of children in the digital realm.
Keywords