مدیریت اطلاعات سلامت (Apr 2016)

Design and Implementation of a Structured Electronic Form for Celiac Disease ‎Pathology ‎Reports: A Text Mining Approach

  • Azadeh Kamel-Ghalibaf,
  • Farzaneh Khadem-Sameni,
  • Majid Jangi,
  • Mohammad Reza Mazaheri-Habibi,
  • Kobra Etminani

Journal volume & issue
Vol. 13, no. 1
pp. 19 – 27

Abstract

Read online

Introduction: Pathology reports generally use an unstructured text format and contain a complex web of ‎relations between medical concepts. In order to enable computers to understand and analyze ‎the reports’ free text, we aimed to convert these concepts and their relations into a structured ‎format.‎ Methods: The training, validation, and evaluation of this implementation study was based on a corpus ‎of 258 pathology reports with a positive diagnosis of celiac disease randomly selected from ‎among the records of 2 pathology laboratories. Our proposed system consisted of 3 phases of ‎standardization of celiac disease pathology reports using Delphi technique with 3 experts, ‎information extraction from free text reports with text mining techniques using Stanford ‎Parser, and automatic classification of celiac disease stages in marsh system using decision ‎tree classifier J48 algorithm.‎ Results: We were successful in extracting information from free text pathology reports and assigning ‎each piece of information to the associated pre-defined fields in standardized template form ‎with an accuracy of 76%. After determining marsh stage for each report in the third phase, ‎our system showed an average overall accuracy of 62%. Evaluation of the third phase as an ‎independent system with manually corrected, gold-standard input achieved an accuracy of ‎greater than 84%.‎ Conclusion: The benefits of standardized synoptic pathology reporting include enhanced completeness ‎and improved consistency, avoidance of confusion and error, and facilitation of the faster and ‎safer transmission of critical pathological data in comparison with narrative reports.‎

Keywords