مدیریت اطلاعات سلامت (Apr 2016)
Design and Implementation of a Structured Electronic Form for Celiac Disease Pathology Reports: A Text Mining Approach
Abstract
Introduction: Pathology reports generally use an unstructured text format and contain a complex web of relations between medical concepts. In order to enable computers to understand and analyze the reports’ free text, we aimed to convert these concepts and their relations into a structured format. Methods: The training, validation, and evaluation of this implementation study was based on a corpus of 258 pathology reports with a positive diagnosis of celiac disease randomly selected from among the records of 2 pathology laboratories. Our proposed system consisted of 3 phases of standardization of celiac disease pathology reports using Delphi technique with 3 experts, information extraction from free text reports with text mining techniques using Stanford Parser, and automatic classification of celiac disease stages in marsh system using decision tree classifier J48 algorithm. Results: We were successful in extracting information from free text pathology reports and assigning each piece of information to the associated pre-defined fields in standardized template form with an accuracy of 76%. After determining marsh stage for each report in the third phase, our system showed an average overall accuracy of 62%. Evaluation of the third phase as an independent system with manually corrected, gold-standard input achieved an accuracy of greater than 84%. Conclusion: The benefits of standardized synoptic pathology reporting include enhanced completeness and improved consistency, avoidance of confusion and error, and facilitation of the faster and safer transmission of critical pathological data in comparison with narrative reports.