MethodsX (Jun 2025)
A new data science trajectory for analysing multiple studies: a case study in physical activity research
Abstract
The analysis of complex mechanisms within population data, and within sub-populations, can be empowered by combining datasets, for example to gain more understanding of change processes of health-related behaviours. Because of the complexity of this kind of research, it is valuable to provide more specific guidelines for such analyses than given in standard data science methodologies. Thereto, we propose a generic procedure for applied data science research in which the data from multiple studies are included. Furthermore, we describe its steps and associated considerations in detail to guide other researchers. Moreover, we illustrate the application of the described steps in our proposed procedure (presented in the graphical abstract) by means of a case study, i.e., a physical activity (PA) intervention study, in which we provided new insights into PA change processes by analyzing an integrated dataset using Bayesian networks. The strengths of our proposed methodology are subsequently illustrated, by comparing this data science trajectories protocol to the classic CRISP-DM procedure. Finally, some possibilities to extend the methodology are discussed. – A detailed process description for multidisciplinary data science research on multiple studies. – Examples from a case study illustrate methodological key points.