Informatics in Medicine Unlocked (Jan 2023)
Use of Machine Learning to classify clinical research to identify applicable compliance requirements
Abstract
Background and objective: Different regulatory requirements apply for clinical research activities, thus identifying the applicable regulatory requirements is one of the cornerstones of regulatory compliance. Typically, classification is a very manual and time-consuming process. To make the classification more efficient, precise and dynamic, we proposed to automate the classification process using an advanced Machine Learning algorithm which follows a set of logical decision steps. The main objective of the project was to prove or reject the hypothesis of whether a Machine Learning model can be trained to automatically classify clinical research. Methods: Multiple Machine Learning models based on Natural Language Processing Classifiers including Random Forest, Support Vector Machine, and Logistic Regression were trained to classify Interventional Clinical Trials, Non-interventional Studies and other Real-world evidence research activities based on the text extracted from a Research Plan or a Study Protocol/Synopsis. Starting with Data Exploration and Data Preparation, for Data Preprocessing, the Term Frequency-Inversed Document Frequency (Tf-idf) column transformation was performed. To assess the performance of the models and choose the most efficient one, the accuracy, precision, recall (also known as sensitivity) and f1-score were calculated for each model. Results: All models showed sufficient performance. However, the Random Forest Classification model outperformed the other solutions tested and therefore was implemented in our Machine Learning application. The hypothesis was accepted based on a recall score of 1.00 as well as a 0.94 accuracy. The probability score of the result was analysed from the perspective of prioritisation of compliance needs minimising the false negatives for interventional and non-interventional studies. This Machine Learning model was then applied directly to achieve successful Clinical Research classification. Conclusion: Thus, we have demonstrated that the implementation of the Machine Learning approach helps in the acceleration of the classification process and at the same time increasing compliance with applicable regulatory requirements. This model can benefit the biopharmaceutical industry and individual researchers.