IEEE Open Journal of the Computer Society (Jan 2024)
A Taxonomy for Python Vulnerabilities
Abstract
Python is one of the most widely adopted programming languages, with applications from web development to data science and machine learning. Despite its popularity, Python is susceptible to vulnerabilities compromising the systems that rely on it. To effectively address these challenges, developers, researchers, and security teams need to identify, analyze, and mitigate risks in Python code, but this is not an easy task due to the scattered, incomplete, and non-actionable nature of existing vulnerability data. This article introduces a comprehensive dataset comprising 1026 publicly disclosed Python vulnerabilities sourced from various repositories. These vulnerabilities are meticulously classified using widely recognized frameworks, such as Orthogonal Defect Classification (ODC), Common Weakness Enumeration (CWE), and Open Web Application Security Project (OWASP) Top 10. Our dataset is accompanied by patched and vulnerable code samples (some crafted with the help of AI), enhancing its utility for developers, researchers, and security teams. In addition, a user-friendly website was developed to allow its interactive exploration and facilitate new contributions from the community. Access to this dataset will foster the development and testing of safer Python applications. The resulting dataset is also analyzed, looking for trends and patterns in the occurrence of Python vulnerabilities, with the aim of raising awareness of Python security and providing practical, actionable guidance to assist developers, researchers, and security teams in bolstering their practices. This includes insights into the types of vulnerabilities they should focus on, the most exploited categories, and the common errors that programmers tend to make while coding that can lead to vulnerabilities.
Keywords