A Machine-Learning-Based Data Science Framework for Effectively and Efficiently Processing, Managing, and Visualizing Big Sequential Data

Alfredo Cuzzocrea; Islam Belmerabet; Abderraouf Hafsaoui; Carson K. Leung

doi:10.3390/computers14070276

Computers (Jul 2025)

A Machine-Learning-Based Data Science Framework for Effectively and Efficiently Processing, Managing, and Visualizing Big Sequential Data

Alfredo Cuzzocrea,
Islam Belmerabet,
Abderraouf Hafsaoui,
Carson K. Leung

Affiliations

Alfredo Cuzzocrea: iDEA Lab, University of Calabria, 87036 Rende, Italy
Islam Belmerabet: iDEA Lab, University of Calabria, 87036 Rende, Italy
Abderraouf Hafsaoui: iDEA Lab, University of Calabria, 87036 Rende, Italy
Carson K. Leung: Department of Computer Science, University of Manitoba, Winnipeg, MB R3T 2N2, Canada

DOI: https://doi.org/10.3390/computers14070276
Journal volume & issue: Vol. 14, no. 7
p. 276

Abstract

Read online

In recent years, the open data initiative has led to the willingness of many governments, researchers, and organizations to share their data and make it publicly available. Healthcare, disease, and epidemiological data, such as privacy statistics on patients who have suffered from epidemic diseases such as the Coronavirus disease 2019 (COVID-19), are examples of open big data. Therefore, huge volumes of valuable data have been generated and collected at high speed from a wide variety of rich data sources. Analyzing these open big data can be of social benefit. For example, people gain a better understanding of disease by analyzing and mining disease statistics, which can inspire them to participate in disease prevention, detection, control, and combat. Visual representation further improves data understanding and corresponding results for analysis and mining, as a picture is worth a thousand words. In this paper, we present a visual data science solution for the visualization and visual analysis of large sequence data. These ideas are illustrated by the visualization and visual analysis of sequences of real epidemiological data of COVID-19. Through our solution, we enable users to visualize the epidemiological data of COVID-19 over time. It also allows people to visually analyze data and discover relationships between popular features associated with COVID-19 cases. The effectiveness of our visual data science solution in improving the user experience of visualization and visual analysis of large sequence data is demonstrated by the real-life evaluation of these sequenced epidemiological data of COVID-19.

Published in Computers

ISSN: 2073-431X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.mdpi.com/journal/computers

About the journal

Abstract

Keywords