Journal of Mathematics (Jan 2022)
Linguistic Analysis of Hindi-English Mixed Tweets for Depression Detection
Abstract
According to recent studies, young adults in India faced mental health issues due to closures of universities and loss of income, low self-esteem, distress, and reported symptoms of anxiety and/or depressive disorder (43%). This makes it a high time to come up with a solution. A new classifier proposed to find those individuals who might be having depression based on their tweets from the social media platform Twitter. The proposed model is based on linguistic analysis and text classification by calculating probability using the TF∗IDF (term frequency-inverse document frequency). Indians tend to tweet predominantly using English, Hindi, or a mix of these two languages (colloquially known as Hinglish). In this proposed approach, data has been collected from Twitter and screened via passing them through a classifier built using the multinomial Naive Bayes algorithm and grid search, the latter being used for hyperparameter optimization. Each tweet is classified as depressed or not depressed. The entire architecture works over English and Hindi languages, which shall help in implementation globally and across multiple platforms and help in putting a stop to the ever-increasing depression rates in a methodical and automated manner. In the proposed model pipeline, composed techniques are used to get the better results, as 96.15% accuracy and 0.914 as the F1 score have been attained.