IEEE Access (Jan 2024)

Big Coding Data: Analysis, Insights, and Applications

  • Md. Mostafizer Rahman,
  • Atsushi Shirafuji,
  • Yutaka Watanobe

DOI
https://doi.org/10.1109/ACCESS.2024.3521383
Journal volume & issue
Vol. 12
pp. 196010 – 196026

Abstract

Read online

In recent years, there has been a notable surge in the generation of coding data on various platforms, including programming competitions and educational institutions. These platforms serve as repositories for substantial volumes of real-world code, problem descriptions, test cases, and activity logs. Despite this wealth of coding data, its potential for advancing software engineering, programming, and research remains largely unexplored. To the best of our knowledge, coding data has been partially explored and utilized in previous research projects such as CodeNet and AlphaCode, but has not been fully considered. There exists a compelling need to explore coding data in more depth to explore its potential for programming and research endeavors. Recognizing this gap, our study undertakes a comprehensive analysis of extensive coding data obtained from a programming learning platform. The Aizu Online Judge (AOJ) serves as our chosen programming platform, providing access to coding data and its associated features. We collected approximately 9 million code evaluation logs, code files, as well as a substantial number of problem descriptions and input/output test cases for thorough analysis and experimentation. The goal of this study is to explore the full potential of the coding data for latent knowledge extraction, programming, and research. We conducted experiments with code evaluation logs, code files, problem descriptions, and test cases to demonstrate the suitability of coding data for various research and applications. Additionally, this study introduces a comprehensive array of features and application programming interfaces (APIs) associated with the AOJ platform. These resources facilitate seamless access and use of coding data, making them a valuable tool for professional and educational initiatives as well as research endeavors.

Keywords