Informatics in Medicine Unlocked (Jan 2016)

Accuracy of rule extraction using a recursive-rule extraction algorithm with continuous attributes combined with a sampling selection technique for the diagnosis of liver disease

  • Yoichi Hayashi,
  • Kazuhiro Fukunaga

Journal volume & issue
Vol. 5
pp. 26 – 38

Abstract

Read online

Although liver cancer is the second most common cause of death from cancer worldwide, because of the limited accuracy and interpretability of extracted classification rules, the diagnosis of liver disease remains difficult. In addition, hepatitis, which is inflammation of the liver, can progress to fibrosis, cirrhosis, or even liver cancer. Numerous methods for diagnosing liver disease have been applied, but most current diagnostic methods are black box models that cannot adequately reveal information hidden in the data. In the medical setting, extracted rules must be not only highly accurate, but also highly interpretable. The Recursive-Rule eXtraction (Re-RX) algorithm is a white box model that generates highly accurate and interpretable classification rules on the basis of both discrete and continuous attributes; however, it tends to generate more rules than other rule extraction algorithms. The objectives of this study were to use a new rule extraction algorithm, Continuous Re-RX combined with sampling selection techniques (Sampling-Continuous Re-RX), to achieve highly accurate and interpretable diagnostic rules for the BUPA and Hepatitis datasets and to quantify the associations between the presence and severity of ascites and serum biomarkers with the risk of developing hepatitis in consideration of Child-Pugh scores. The performance of Sampling-Continuous Re-RX was compared with existing techniques, and as a result, it was found to extract more accurate, concise, and interpretable rules for the BUPA and Hepatitis datasets compared with previous extraction algorithms. In addition, the rules extracted using the proposed method were close to the trade-off curve, which indicated that they were more accurate and interpretable, and therefore more suitable in the medical setting. Keywords: Rule extraction, Re-RX algorithm, Sampling selection technique, BUPA liver disorder dataset, Hepatitis dataset, Child-Pugh score