Clustering analysis for classifying fake real estate listings

Maifuza Mohd Amin; Nor Samsiah Sani; Mohammad Faidzul Nasrudin; Salwani Abdullah; Amit Chhabra; Faizal Abd Kadir

doi:10.7717/peerj-cs.2019

PeerJ Computer Science (Jun 2024)

Clustering analysis for classifying fake real estate listings

Maifuza Mohd Amin,
Nor Samsiah Sani,
Mohammad Faidzul Nasrudin,
Salwani Abdullah,
Amit Chhabra,
Faizal Abd Kadir

Affiliations

Maifuza Mohd Amin: Center for Artificial Intelligence Technology, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
Nor Samsiah Sani: Center for Artificial Intelligence Technology, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
Mohammad Faidzul Nasrudin: Center for Artificial Intelligence Technology, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
Salwani Abdullah: Center for Artificial Intelligence Technology, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
Amit Chhabra: Department of Computer Engineering and Technology, Guru Nanak Dev University, Amritsar, Amritsar, India
Faizal Abd Kadir: My Crib Resources, Shah Alam, Selangor, Malaysia

DOI: https://doi.org/10.7717/peerj-cs.2019
Journal volume & issue: Vol. 10
p. e2019

Abstract

Read online Read online

With the rapid growth of online property rental and sale platforms, the prevalence of fake real estate listings has become a significant concern. These deceptive listings waste time and effort for buyers and sellers and pose potential risks. Therefore, developing effective methods to distinguish genuine from fake listings is crucial. Accurately identifying fake real estate listings is a critical challenge, and clustering analysis can significantly improve this process. While clustering has been widely used to detect fraud in various fields, its application in the real estate domain has been somewhat limited, primarily focused on auctions and property appraisals. This study aims to fill this gap by using clustering to classify properties into fake and genuine listings based on datasets curated by industry experts. This study developed a K-means model to group properties into clusters, clearly distinguishing between fake and genuine listings. To assure the quality of the training data, data pre-processing procedures were performed on the raw dataset. Several techniques were used to determine the optimal value for each parameter of the K-means model. The clusters are determined using the Silhouette coefficient, the Calinski-Harabasz index, and the Davies-Bouldin index. It was found that the value of cluster 2 is the best and the Camberra technique is the best method when compared to overlapping similarity and Jaccard for distance. The clustering results are assessed using two machine learning algorithms: Random Forest and Decision Tree. The observational results have shown that the optimized K-means significantly improves the accuracy of the Random Forest classification model, boosting it by an impressive 96%. Furthermore, this research demonstrates that clustering helps create a balanced dataset containing fake and genuine clusters. This balanced dataset holds promise for future investigations, particularly for deep learning models that require balanced data to perform optimally. This study presents a practical and effective way to identify fake real estate listings by harnessing the power of clustering analysis, ultimately contributing to a more trustworthy and secure real estate market.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords