SoftwareX (May 2024)
Toward efficient data science: A comprehensive MLOps template for collaborative code development and automation
Abstract
In the era of big data analytics and AI applications, data provenance is as important as ever, particularly as applications emerge in vital industries like healthcare. Additionally, as the suites of tools and packages grow exponentially, code transparency and experiment record keeping are essential to ensuring full traceability of AI and ML models. This manuscript presents an open-source Machine Learning Operations (MLOps) Template that provides a consistent framework to support collaborative development and improve efficiency. The template provides a robust and reliable software structure incorporating essential development aspects. These tools include automated code documentation, built-in package management, experiment tracking, configuration and logging infrastructure, and more. The template is built on an agglomeration of best practices gleaned from industry and academia alike, providing a great starting point for any ML/AI project.