Feature‐based augmentation and classification for tabular data

Balachander Sathianarayanan; Yogesh Chandra Singh Samant; Prahalad S. Conjeepuram Guruprasad; Varshin B. Hariharan; Nirmala Devi Manickam

doi:10.1049/cit2.12123

CAAI Transactions on Intelligence Technology (Sep 2022)

Feature‐based augmentation and classification for tabular data

Balachander Sathianarayanan,
Yogesh Chandra Singh Samant,
Prahalad S. Conjeepuram Guruprasad,
Varshin B. Hariharan,
Nirmala Devi Manickam

Affiliations

Balachander Sathianarayanan: Amrita School of Engineering, Amrita Vishwa Vidyapeetham Coimbatore India
Yogesh Chandra Singh Samant: Amrita School of Engineering, Amrita Vishwa Vidyapeetham Coimbatore India
Prahalad S. Conjeepuram Guruprasad: Amrita School of Engineering, Amrita Vishwa Vidyapeetham Coimbatore India
Varshin B. Hariharan: Amrita School of Engineering, Amrita Vishwa Vidyapeetham Coimbatore India
Nirmala Devi Manickam: Amrita School of Engineering, Amrita Vishwa Vidyapeetham Coimbatore India

DOI: https://doi.org/10.1049/cit2.12123
Journal volume & issue: Vol. 7, no. 3
pp. 481 – 491

Abstract

Read online

Abstract Generating synthetic samples for a tabular data is a strenuous task. Most of the time, the columns (features) in the dataset may not follow an ideal distribution function. The objective of the proposed algorithm, Histogram Augmentation Technique (HAT), is to generate a dataset whose distribution is similar to that of the original dataset. This augmentation is achieved based on individual columns, where separate algorithms are designed for continuous and discrete columns. Humans also use features of an object for interpretation. When humans make a judgement, they notice prominent features and characterise the perceived object. However, conventional Machine Learning classifiers are designed and trained on the basis of samples. Taking the features as the basis for classification, Feature Importance Classifier (FIC) has been attempted in this work. FIC treats every feature independent of each other, and ranks the features based on its dependence with the classified label. It has been found that the FIC has the highest accuracy and has improved the accuracy by 5.54% on average, when it's compared to other classifiers. The suggested algorithms have been experimented on five datasets and compared with two augmentation algorithms and four state‐of‐the‐art ML classification algorithms.

Published in CAAI Transactions on Intelligence Technology

ISSN: 2468-2322 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/24682322

About the journal