Breast Cancer Detection using Decision Tree and Random Forest

Fergie Joanda Kaunang; Bhustomy Hakim; Fedelis Fraderic; Sherren Hartono; Andrew Kristanto Mulyanto

doi:10.30871/jaic.v9i2.9073

Journal of Applied Informatics and Computing (Mar 2025)

Breast Cancer Detection using Decision Tree and Random Forest

Fergie Joanda Kaunang,
Bhustomy Hakim,
Fedelis Fraderic,
Sherren Hartono,
Andrew Kristanto Mulyanto

Affiliations

Fergie Joanda Kaunang: Informatics, Universitas Bunda Mulia
Bhustomy Hakim: Information System, Universitas Bunda Mulia
Fedelis Fraderic: Informatics, Universitas Bunda Mulia
Sherren Hartono: Informatics, Universitas Bunda Mulia
Andrew Kristanto Mulyanto: Informatics, Universitas Bunda Mulia

DOI: https://doi.org/10.30871/jaic.v9i2.9073
Journal volume & issue: Vol. 9, no. 2
pp. 302 – 309

Abstract

Read online

Cancer is one of the most challenging diseases to cure and is a chronic condition that contributes significantly to global mortality. With advancements in artificial intelligence (AI) technology, AI-integrated systems can provide quick and accurate diagnoses based on collected medical data. By leveraging machine learning techniques, this study aims to compare the performance of two models using the Decision Tree (DT) and Random Forest (RF) algorithms on routine blood test data. The research process involves data preprocessing techniques such as handling missing values, detecting outliers, and feature selection, followed by applying the bootstrap aggregating technique to enhance model performance. Feature selection is used to identify the most significant features in the data that contribute to cancer detection. Using the KBest feature selection technique, the study found that the features age, BMI, leptin, adiponectin, and MCP-1 had the highest correlation with the target variable. The resulting models were evaluated to compare the performance of each algorithm. The evaluation results showed that the RF algorithm outperformed DT, achieving an accuracy of 89.65% on the processed dataset using the bootstrap technique, compared to DT's accuracy of 80.17%. Additionally, the RF algorithm demonstrated superior metric values, including a precision of 91.66% and an F1-score of 87.12%. This study concludes that the RF algorithm is more effective than DT for detecting cancer in limited datasets, especially when used with the bootstrap technique. The findings are expected to support the development of decision support systems in healthcare services for more accurate early cancer detection.

Published in Journal of Applied Informatics and Computing

ISSN: 2548-6861 (Online)
Publisher: Politeknik Negeri Batam
Country of publisher: Indonesia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://jurnal.polibatam.ac.id/index.php/JAIC

About the journal

Abstract

Keywords