Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques

Luis A. Briceno-Mena; Miriam Nnadili; Michael G. Benton; Jose A. Romagnoli

doi:10.1017/dce.2022.21

Data-Centric Engineering (Jan 2022)

Data mining and knowledge discovery in chemical processes: Effect of alternative processing techniques

Luis A. Briceno-Mena,
Miriam Nnadili,
Michael G. Benton,
Jose A. Romagnoli

Affiliations

Luis A. Briceno-Mena: ORCiD; Cain Department of Chemical Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, USA
Miriam Nnadili: Cain Department of Chemical Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, USA
Michael G. Benton: Cain Department of Chemical Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, USA
Jose A. Romagnoli: Cain Department of Chemical Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, USA

DOI: https://doi.org/10.1017/dce.2022.21
Journal volume & issue: Vol. 3

Abstract

Read online

Data mining and knowledge discovery (DMKD) focuses on extracting useful information from data. In the chemical process industry, tasks such as process monitoring, fault detection, process control, optimization, etc., can be achieved using DMKD. However, the selection of the appropriate method for each step in the DMKD process, namely data cleaning, sampling, scaling, dimensionality reduction (DR), clustering, clustering analysis and data visualization to obtain meaningful insights is far from trivial. In this contribution, a computational environment (FastMan) is introduced and used to illustrate how method selection affects DMKD in chemical process data. Two case studies, using data from a simulated natural gas liquid plant and real data from an industrial pyrolysis unit, were conducted to demonstrate the applicability of these methodologies in real-life scenarios. Sampling and normalization methods were found to have a great impact on the quality of the DMKD results. Also, a neighbor graphs method for DR, t-distributed stochastic neighbor embedding, outperformed principal component analysis, a matrix factorization method frequently used in the chemical process industry for identifying both local and global changes.

Published in Data-Centric Engineering

ISSN: 2632-6736 (Online)
Publisher: Cambridge University Press
Country of publisher: United Kingdom
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: https://www.cambridge.org/core/journals/data-centric-engineering

About the journal

Abstract

Keywords