Cogent Engineering (Dec 2023)

A comprehensive comparison and analysis of machine learning algorithms including evaluation optimized for geographic location prediction based on Twitter tweets datasets

  • Hasti Samadi,
  • Mohammed Ahsan Kollathodi

DOI
https://doi.org/10.1080/23311916.2023.2232602
Journal volume & issue
Vol. 10, no. 1

Abstract

Read online

AbstractGiven a tweet, a machine learning model when after undertaking the training, development and testing phase is able to predict where the author of the tweet is situated. Our notion is that a user’s tweet may consist of certain location-specific content which can indicatively consist of certain names or phrases related to the geographic location of the user. The primary purpose of the research was to identify a suitable algorithm to perform geographic location prediction accurately based on the Twitter tweets dataset. Geolocation prediction of Twitter users can be immensely helpful for demographic analysis, targeted advertising, location-based recommendation paving the way to enhanced user experience, advertising, location prediction during a time of crisis or disaster, and more. Knowing the location of users can also be helpful in finding employment as it would help determine the accessibility of certain users to their potential employees. Moreover, such a model would also help personalize someone’s newsfeed and tweets with greater accuracy helping users find what they really need. In today’s world, it can be observed that increasing amounts of data would result in a more precise location estimation, giving users belief in the soundness and continued refinement of location prediction using the user tweets data. Through increasing the enormous human-powered sensing capabilities of Twitter and associated microblogging services with content-derived location data, the algorithms can overcome the dispersion of geo-enabled features in these services and bring augmented scope and breadth to surfacing location-based personalized information services. With these objectives in mind, we propose and evaluate various different machine learning algorithms and models for predicting a tweeter’s geographical location. In addition to this, this paper would primarily analyze the efficacy of many different algorithms on the problem of determining a tweeter’s location by undergoing a comparative analysis of the machine learning algorithms based on different performance and evaluation metrics like accuracy, precision, recall, and f1-score which gives us an indication which algorithm would be most suitable for a specific dataset. Once analyzing the results, it was found that for a particular chosen dataset random forest classifier was producing the best performance metrics and was most suitable to perform prediction of user geographic location.

Keywords