AI (Jun 2024)

Inside Production Data Science: Exploring the Main Tasks of Data Scientists in Production Environments

  • Arno Schmetz,
  • Achim Kampker

DOI
https://doi.org/10.3390/ai5020043
Journal volume & issue
Vol. 5, no. 2
pp. 873 – 886

Abstract

Read online

Modern production relies on data-based analytics for the prediction and optimization of production processes. Specialized data scientists perform tasks at companies and research institutions, dealing with real data from actual production environments. The roles of data preprocessing and data quality are crucial in data science, and an active research field deals with methodologies and technologies for this. While anecdotes and generalized surveys indicate preprocessing is the major operational task for data scientists, a detailed view of the subtasks and the domain of production data is missing. In this paper, we present a multi-stage survey on data science tasks in practice in the field of production. Using expert knowledge and insights, we found data preprocessing to be the major part of the tasks of data scientists. In detail, we found that tackling missing values, finding data point meanings, and synchronization of multiple time-series were often the most time-consuming preprocessing tasks.

Keywords