IEEE Access (Jan 2019)

A Novel Co-Training-Based Approach for the Classification of Mental Illnesses Using Social Media Posts

  • Subhan Tariq,
  • Nadeem Akhtar,
  • Humaira Afzal,
  • Shahzad Khalid,
  • Muhammad Rafiq Mufti,
  • Shahid Hussain,
  • Asad Habib,
  • Ghufran Ahmad

DOI
https://doi.org/10.1109/ACCESS.2019.2953087
Journal volume & issue
Vol. 7
pp. 166165 – 166172

Abstract

Read online

Context: Recently, research community of certain domain showing their eagerness towards the use of social media networks to gain constructive knowledge in decision making and automation, such as aid to perform software development activities, crypto-currencies usage, network community detection and recommendation and so on. Recently, besides other domains of eHealth, the use of social media and big data analytics has become hot topic to predict the patient of mental illness involved in either depression, schizophrenia, eating disorders, anxiety or addictive behaviors. Problem: Traditional methods either need enough historic data or to keep the regular monitoring on patient activities for identification of a patient associated with a mental illness disease. Method: In order to address this issue, we propose a methodology to classify the patients associated with chronic mental illness diseases (i.e. Anxiety, Depression, Bipolar, and ADHD (Attention Deficit Hyperactivity Disorder) based on the data extracted from the Reddit, a well-known network community platform. The proposed method is employed through Co-training (type of semi-supervised learning approach) technique by incorporating the discriminative power of widely used classifiers namely Random Forrest (RF), Support Vector Machine (SVM), and Naïve Bayes (NB). We used Reddit API to download posts and top five associated comments for construction of a feature space. Results: The experimental results indicate the effectiveness of Co-training based classification rather than the state of the art classifiers by a margin of 3% on average in par with every state of art technique. In future, the proposed method could be employed to investigate any classification problem of any domain by extracting date from the social media.

Keywords