Journal of Innovation Information Technology and Application (Dec 2024)
Programming Languages Prediction from Stack Overflow Questions Using Deep Learning
Abstract
Understanding programming languages is vital in the ever-evolving world of software development. With constant updates and the emergence of new languages, staying informed is essential for any programmer. Additionally, utilizing a tagging system for data storage is a widely accepted practice. In our study, queries were selected from a Stack Overflow dataset using random sampling. Then the tags were cleaned and separated the data into title, title + body, and body. After preprocessing, tokenizing, and padding the data, randomly split it into training and testing datasets. Then various deep learning models were applied such as Long Short-Term Memory, Bidirectional Long Short-Term Memory, Multilayer Perceptron, Convolutional Neural Network, Feedforward Neural Network, Gated Recurrent Unit, Recurrent Neural Network, Artificial Neural Network algorithms to the dataset in order to identify the programming languages from the tags. This study aims to assist in identifying the programming language from the question tags, which can help programmers better understand the problem or make it easier to understand other programming languages.
Keywords