IET Image Processing (Dec 2024)

Metric‐based pill recognition with the help of textual and visual cues

  • Richárd Rádli,
  • Zsolt Vörösházi,
  • László Czúni

DOI
https://doi.org/10.1049/ipr2.13273
Journal volume & issue
Vol. 18, no. 14
pp. 4623 – 4638

Abstract

Read online

Abstract Pill image recognition by machine vision can reduce the risk of taking the wrong medications, a severe healthcare problem. Automated dispensing machines or home applications both need reliable image processing techniques to compete with the problem of changing viewing conditions, large number of classes, and the similarity in pill appearance. The problem is attacked with a multi‐stream, two‐phase metric embedding neural model. To enhance the metric learning procedure, dynamic margin setting is introduced into the loss function. Moreover, it is shown that besides the visual features of drug samples, even free text of drug leaflets (processed with a natural language model) can be used to set the value of the margin in the triplet loss and thus increase the recognition accuracy of testing. Thus, besides using the conventional metric learning approach, the given discriminating features can be explicitly injected into the metric model using the NLP of the free text of pill leaflets or descriptors of images of selected pills. The performance on two datasets is analysed and a 1.6% (two‐sided) and 2.89% (one‐sided) increase in Top‐1 accuracy on the CURE dataset is reported compared to existing best results. The inference time on CPU and GPU makes the proposed model suitable for different kinds of applications in medical pill verification; moreover, the approach applies to other areas of object recognition where few‐shot problems arise. The proposed high‐level feature injection method (into a low‐level metric learning model) can also be exploited in other cases, where class features can be well described with textual or visual cues.

Keywords