A Hybrid Support Vector Machine Algorithm for Big Data Heterogeneity Using Machine Learning

Shafqat Ul Ahsaan; Harleen Kaur; Ashish Kumar Mourya; Sameena Naaz

doi:10.3390/sym14112344

Symmetry (Nov 2022)

A Hybrid Support Vector Machine Algorithm for Big Data Heterogeneity Using Machine Learning

Shafqat Ul Ahsaan,
Harleen Kaur,
Ashish Kumar Mourya,
Sameena Naaz

Affiliations

Shafqat Ul Ahsaan: Department of Computer Science, Jamia Hamdard University, New Delhi 110062, India
Harleen Kaur: Department of Computer Science, Jamia Hamdard University, New Delhi 110062, India
Ashish Kumar Mourya: Department of Computer Science, Jamia Hamdard University, New Delhi 110062, India
Sameena Naaz: Department of Computer Science, Jamia Hamdard University, New Delhi 110062, India

DOI: https://doi.org/10.3390/sym14112344
Journal volume & issue: Vol. 14, no. 11
p. 2344

Abstract

Read online

Big data technology has gained attention in all fields, particularly with regard to research and financial institutions. This technology has changed the world tremendously. Researchers and data scientists are currently working on its applicability in different domains such as health care, medicine, and the stock market, among others. The data being generated at an unexpected pace from multiple sources like social media, health care contexts, and Internet of things have given rise to big data. Management and processing of big data represent a challenge for researchers and data scientists, as there is heterogeneity and ambiguity. Heterogeneity is considered to be an important characteristic of big data. The analysis of heterogeneous data is a very complex task as it involves the compilation, storage, and processing of varied data based on diverse patterns and rules. The proposed research has focused on the heterogeneity problem in big data. This research introduces the hybrid support vector machine (H-SVM) classifier, which uses the support vector machine as a base. In the proposed algorithm, the heterogeneous Euclidean overlap metric (HEOM) and Euclidean distance are introduced to form clusters and classify the data on the basis of ordinal and nominal values. The performance of the proposed learning classifier is compared with linear SVM, random forest, and k-nearest neighbor. The proposed algorithm attained the highest accuracy as compared to other classifiers.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords