Franklin Open (Sep 2024)

Optimization of machine learning models through quantization and data bit reduction in healthcare datasets

  • Mitul Goswami,
  • Suneeta Mohanty,
  • Prasant Kumar Pattnaik

Journal volume & issue
Vol. 8
p. 100136

Abstract

Read online

This study focuses on enhancing complex machine learning models through quantization and data bit reduction. The primary goal is to reduce processing time while maintaining model performance, which is particularly relevant for intricate models with prolonged execution times. The research employs two medical datasets, namely Heart Disease Prediction and Breast Cancer Detection, and applies optimization techniques to K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) machine-learning models. To achieve optimization, the study employs effective quantization and data bit reduction techniques such as QuantileTransformer, Numpy.round, and KBinsDiscretizer functions. These techniques are utilized to convert input data from float64 to float32 and int32, resulting in a streamlined data representation. The trade-off between processing time and model accuracy is explored, acknowledging that some compromise in accuracy might occur after optimization. The experimentation reveals that there is a noticeable reduction in time complexity after optimization, with a marginal impact on model accuracy. Interestingly, the study concludes that the outcome and efficiency of optimization techniques are influenced not only by the specific technique used but also by the nature of the dataset and machine learning model under consideration. This comprehensive research showcases the applicability of optimization techniques, specifically quantization and data bit reduction, in complex machine learning models. By conducting experiments on medical datasets and analyzing KNN and SVM models, the study underscores the delicate balance between processing time and model accuracy. The findings emphasize that the success of optimization strategies is context-dependent, relying not only on the chosen technique but also on the interplay between the technique, model, and dataset.

Keywords