Jisuanji kexue (Aug 2021)
Data Science Platform:Features,Technologies and Trends
Abstract
The concept and types of data science platform are proposed based upon in-depth studies of more than 35 data science platforms from the annual report of Magic Quadrant for Data Science Platforms since 2015.The main scientific issues in the academic research of data science platform involve the design of data science platform,the scalability of data science platform,the research and development of data science platform based on data lake,the supporting team cooperation ability of data science platform,the open strategy of data science platform and the engineering methodology of data science platform.The main features of data science platform include modular development and integration capability,DevOps,emphasis on scalability,emphasis on user experience,emphasis on citizen data scientist,and emphasis on human-machine collaboration scenario.The key technologies for the realization of data science platform are machine learning,stream processing,tidy data,containerization and data visualization.The future development trend of data science platform is mainly reflected in the integration with artificial intelligence,the support for open source technology,the emphasis on citizen data scientists,the integration of data governance,the introduction of data lake,the exploration of advanced analysis and application,the transformation to the whole pipeline of data science and the diversification of application fields.The research and development activities of data science platform should follow the design principles of activating data value as the center,human-in-the loop,DevOps,balance of usability and explainability,cultivation of data science product ecosystem,emphasis on user experience and ease of use,and integration with other business systems.At present,the research and development of data science platform needs theoretical breakthroughs in data bias and fairness,robustness and stability,privacy protection,causal analysis,trusted/responsible data science platform.
Keywords