Computers (Jul 2025)
A Machine-Learning-Based Data Science Framework for Effectively and Efficiently Processing, Managing, and Visualizing Big Sequential Data
Abstract
In recent years, the open data initiative has led to the willingness of many governments, researchers, and organizations to share their data and make it publicly available. Healthcare, disease, and epidemiological data, such as privacy statistics on patients who have suffered from epidemic diseases such as the Coronavirus disease 2019 (COVID-19), are examples of open big data. Therefore, huge volumes of valuable data have been generated and collected at high speed from a wide variety of rich data sources. Analyzing these open big data can be of social benefit. For example, people gain a better understanding of disease by analyzing and mining disease statistics, which can inspire them to participate in disease prevention, detection, control, and combat. Visual representation further improves data understanding and corresponding results for analysis and mining, as a picture is worth a thousand words. In this paper, we present a visual data science solution for the visualization and visual analysis of large sequence data. These ideas are illustrated by the visualization and visual analysis of sequences of real epidemiological data of COVID-19. Through our solution, we enable users to visualize the epidemiological data of COVID-19 over time. It also allows people to visually analyze data and discover relationships between popular features associated with COVID-19 cases. The effectiveness of our visual data science solution in improving the user experience of visualization and visual analysis of large sequence data is demonstrated by the real-life evaluation of these sequenced epidemiological data of COVID-19.
Keywords