On classification in the case of a medical data set with a complicated distribution

Martti Juhola; Henry Joutsijoki; Heikki Aalto; Timo P. Hirvonen

doi:10.1016/j.aci.2014.03.001

Applied Computing and Informatics (Jan 2014)

On classification in the case of a medical data set with a complicated distribution

Martti Juhola,
Henry Joutsijoki,
Heikki Aalto,
Timo P. Hirvonen

Affiliations

Martti Juhola: Computer Science, School of Information Sciences, University of Tampere, Tampere 33014, Finland
Henry Joutsijoki: Computer Science, School of Information Sciences, University of Tampere, Tampere 33014, Finland
Heikki Aalto: Department of Otorhinolaryngology & Head and Neck Surgery, University of Helsinki and Helsinki University Central Hospital, Helsinki 00029 HUS, Finland
Timo P. Hirvonen: Department of Otorhinolaryngology & Head and Neck Surgery, University of Helsinki and Helsinki University Central Hospital, Helsinki 00029 HUS, Finland

DOI: https://doi.org/10.1016/j.aci.2014.03.001
Journal volume & issue: Vol. 10, no. 1
pp. 52 – 67

Abstract

Read online

In one of our earlier studies we noticed how straightforward cleaning of our medical data set impaired its classification results considerably with some machine learning methods, but not all of them, unexpectedly and against intuition compared to the original situation without any data cleaning. After a more precise exploration of the data, we found that the reason was the complicated variable distribution of the data although there were only two classes in it. In addition to a straightforward data cleaning method, we used an efficient way called neighbourhood cleaning that solved the problem and improved our classification accuracies 5–10%, at their best, up to 95% of all test cases. This shows how important it is first very carefully to study distributions of data sets to be classified and use different cleaning techniques in order to obtain best classification results.

Published in Applied Computing and Informatics

ISSN: 2634-1964 (Print); 2210-8327 (Online)
Publisher: Emerald Publishing
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.emeraldgrouppublishing.com/journal/aci

About the journal

Abstract

Keywords