IEEE Access (Jan 2020)

Synthetic Blood Smears Generation Using Locality Sensitive Hashing and Deep Neural Networks

  • Rabiah Al-Qudah,
  • Ching Y. Suen

DOI
https://doi.org/10.1109/access.2020.2999349
Journal volume & issue
Vol. 8
pp. 102530 – 102539

Abstract

Read online

Peripheral Blood Smear (PBS) analysis is a vital routine test carried out by hematologists to assess some aspects of humans' health status. PBS analysis is prone to human errors and utilizing computer-based analysis can greatly enhance this process in terms of accuracy and cost. Recent approaches in learning algorithms, such as deep learning, are data hungry, but due to the scarcity of labeled medical images, researchers had to find viable alternative solutions to increase the size of available datasets. Synthetic datasets provide a promising solution to data scarcity, however, the complexity of blood smears' natural structure adds an extra layer of challenge to its synthesizing process. In this work, we propose a methodology that utilizes Locality Sensitive Hashing (LSH) to create a novel balanced dataset of 2500 synthetic blood smears. This dataset, which was automatically annotated during the generation phase, will be made public for research purposes and covers 17 essential categories of blood cells. We proved the effectiveness of the proposed dataset by utilizing it for training a deep neural network, this model got a very high accuracy score of 98.72% when tested with the well known ALL-IDB dataset. The dataset also got the approval of 5 experienced hematologists to meet the general standards of making thin blood smears.

Keywords