Soft Computing Letters (Dec 2021)

A fuzzy proximity relation approach for outlier detection in the mixed dataset by using rough entropy-based weighted density method

  • T. Sangeetha,
  • Geetha Mary A

Journal volume & issue
Vol. 3
p. 100027

Abstract

Read online

Data mining is an emerging technology where researchers explore innovative ideas in different domains, particularly detecting anomalies. Instances in the dataset which considerably deviate from others by their common patterns are known as anomalies. The state of being ambiguous and not affording certainty of data exists in this world of nature. Rough Set Theory is a proven methodology which deals with ambiguity and uncertainty of data. Research works that have been done until this point were focused on numeric or categorical type, which fails when the attributes are mixed type. By using fuzzy proximity and ordering relations, the numerical data has been converted to categorical data. This article presented an idea for detecting outliers in mixed data where the weighted density values of attributes and objects are calculated. The proposed approach has been compared with existing outlier detection methods by taking the hiring dataset as an example and benchmarked with Harvard dataverse datasets to prove its efficiency and performance.

Keywords