Principal Component Analysis-Based Data Clustering for Labeling of Level Damage Sector in Post-Natural Disasters

Agung Teguh Wibowo Almais; Adi Susilo; Agus Naba; Moechammad Sarosa; Cahyo Crysdian; Imam Tazi; Mokhamad Amin Hariyadi; Muhammad Aziz Muslim; Puspa Miladin Nuraida Safitri Abdul Basid; Yunifa Miftachul Arif; Mohammad Singgih Purwanto; Diyan Parwatiningtyas; Supriyono; Hendro Wicaksono

doi:10.1109/ACCESS.2023.3275852

IEEE Access (Jan 2023)

Principal Component Analysis-Based Data Clustering for Labeling of Level Damage Sector in Post-Natural Disasters

Agung Teguh Wibowo Almais,
Adi Susilo,
Agus Naba,
Moechammad Sarosa,
Cahyo Crysdian,
Imam Tazi,
Mokhamad Amin Hariyadi,
Muhammad Aziz Muslim,
Puspa Miladin Nuraida Safitri Abdul Basid,
Yunifa Miftachul Arif,
Mohammad Singgih Purwanto,
Diyan Parwatiningtyas,
Supriyono,
Hendro Wicaksono

Affiliations

Agung Teguh Wibowo Almais: ORCiD; Department of Physics, Universitas Brawijaya Malang, Malang, Indonesia
Adi Susilo: ORCiD; Department of Physics, Universitas Brawijaya Malang, Malang, Indonesia
Agus Naba: ORCiD; Department of Physics, Universitas Brawijaya Malang, Malang, Indonesia
Moechammad Sarosa: Department of Electrical Engineering, State Polytechnic of Malang, Malang, Indonesia
Cahyo Crysdian: Department of Informatics Engineering, Universitas Islam Negeri Maulana Malik Ibrahim Malang, Malang, Indonesia
Imam Tazi: Department of Informatics Engineering, Universitas Islam Negeri Maulana Malik Ibrahim Malang, Malang, Indonesia
Mokhamad Amin Hariyadi: Department of Informatics Engineering, Universitas Islam Negeri Maulana Malik Ibrahim Malang, Malang, Indonesia
Muhammad Aziz Muslim: Department of Electrical Engineering, Universitas Brawijaya, Malang, Indonesia
Puspa Miladin Nuraida Safitri Abdul Basid: Department of Informatics Engineering, Universitas Islam Negeri Maulana Malik Ibrahim Malang, Malang, Indonesia
Yunifa Miftachul Arif: ORCiD; Department of Informatics Engineering, Universitas Islam Negeri Maulana Malik Ibrahim Malang, Malang, Indonesia
Mohammad Singgih Purwanto: Department of Geophysical Engineering, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
Diyan Parwatiningtyas: Department of Physics, Universitas Pertahanan Republik Indonesia, Bogor, Indonesia
Supriyono: Department of Informatics Engineering, Universitas Islam Negeri Maulana Malik Ibrahim Malang, Malang, Indonesia
Hendro Wicaksono: ORCiD; School of Business, Social and Decision Science, Constructor University, Bremen, Germany

DOI: https://doi.org/10.1109/ACCESS.2023.3275852
Journal volume & issue: Vol. 11
pp. 74590 – 74601

Abstract

Read online

Post-disaster sector damage data is data that has features or criteria in each case the level of damage to the post-natural disaster sector data. These criteria data are building conditions, building structures, building physicals, building functions, and other supporting conditions. Data on the level of damage to the post-natural disaster sector used in this study amounted to 216 data, each of which has 5 criteria for damage to the post-natural disaster sector. Then PCA is used to look for labels in each data. The results of these labels will be used to cluster data based on the value scale of the results of data normalization in the PCA process. In the data normalization process at PCA, the data is divided into 2 components, namely PC1 and PC2. Each component has a variance ratio and eigenvalue generated in the PCA process. For PC1 it has a variance ratio of 85.17% and an eigenvalue of 4.28%, while PC2 has a variance ratio of 9.36% and an eigenvalue of 0.47%. The results of data normalization are then made into a 2-dimensional graph to see the data visualization of the results of each main component (PC). The result is that there is 3 data cluster using a value scale based on the PCA results chart. The coordinate value (n) of each cluster is cluster 1 ( $\text{n} < 0$ ), cluster 2 ( $0\le \text{n} < 2$ ), and cluster 3 ( $\text{n}\ge 2$ ). To test these 3 groups of data, it is necessary to conduct trials by comparing the original target data, there are two experiments, namely testing the PC1 results based on the original target data, and the PC2 results based on the original target data. The result is that there are 2 updates, the first is that the distribution of PC1 data is very good when comparing the distribution of data with PC2 in grouping data, because the eigenvalue of PC1 is greater than that of PC2. While second, the results of testing the PC1 data with the original target data produce good data grouping, because the original target data which has a value of 1 (slightly damaged) occupies the coordinates of group 1 (n < 0), the original target data which has a value of 2 (moderately damaged) occupies group 2 coordinates ( $0\le \text{n} < 2$ ), and for the original target data the value 3 (heavily damaged) occupies group 3 coordinates ( $\text{n}\ge 2$ ). Therefore, it can be concluded that PCA, which so far has been used by many studies as feature reduction, this study uses PCA for labeling unsupervised data so that it has appropriate data labels for further processing.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords