EAI Endorsed Transactions on Industrial Networks and Intelligent Systems (Nov 2024)
Drug classification system based on drug composition and usage instructions
Abstract
This study presents a natural language processing (NLP) approach to classify drugs based on compositional and usage descriptions. NLP techniques including text preprocessing, word embedding, and deep learning models were applied to a Vietnamese drug dataset. Traditional machine learning models like Support Vector Machines (SVM) and deep models including Bidirectional Long Short-Term Memory (BiLSTM) and PhoBERT were evaluated. Besides, since there is a limitation in the information of our own collected data, some data augmentation techniques were applied to increase the variation of the dataset. Results show PhoBERT achieving 95% accuracy, highlighting the benefits of transferring knowledge from large language models. Errors primarily occurred between similar drug categories, suggesting taxonomy refinement could improve performance. In summary, an automated drug classification framework was developed leveraging state-of- the-art NLP, validating the feasibility of analyzing drug data at scale and aiding therapeutic understanding. This supports NLP’s potential in pharmacovigilance applications.
Keywords