IEEE Access (Jan 2019)
Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach
Abstract
Medical text classification assigns medical related text into different categories such as topics or disease types. Machine learning based techniques have been widely used to perform such tasks despite the obvious drawback in such “black box” approach, leaving no easy way to fine-tune the resultant model for better performance. We propose a novel constructive heuristic approach to generate a set of regular expressions that can be used as effective text classifiers. The main innovation of our approach is that we develop a novel regular expression based text classifier with both satisfactory classification performance and excellent interpretability. We evaluate our framework on real-world medical data provided by our collaborator, one of the largest online healthcare providers in the market, and observe the high performance and consistency of this approach. Experimental results show that the machine-generated regular expressions can be effectively used in conjunction with machine learning techniques to perform medical text classification tasks. The proposed methodology improves the performance of baseline methods (Naive Bayes and Support Vector Machines) by 9% in precision and 4.5% in recall. We also evaluate the performance of modified regular expressions by human experts and demonstrate the potential of practical applications using the proposed method.
Keywords