npj Women's Health (Mar 2025)
Development and evaluation of deep learning models for cardiotocography interpretation
Abstract
Abstract The variability in the visual interpretation of cardiotocograms (CTGs) poses substantial challenges in obstetric care. Despite recent strides in automated CTG interpretation for early detection of fetal hypoxia, the comparative efficacy of objective versus subjective ground truth labels and robustness to temporal distribution shifts remains underexplored. Using a published convolutional neural network (CNN), we predict fetal compromise from CTG recordings, incorporating pre-processing and hyperparameter tuning. We use an open-source dataset of CTGs from 552 patients at University Hospital Brno, Czech Republic. Models trained with objective umbilical cord blood pH measurements (abnormal: pH < 7.20) outperformed those trained with subjective clinician-assigned Apgar scores (abnormal: Apgar < 7), demonstrating greater consistency and robustness to temporal shifts. This reflects the heterogeneity of Apgar scores, making them a more complex classification target. Additionally, aligning training signal intervals with the timing of outcome measurement exhibited superior performance, holding particular relevance for intermittent CTG measurement scenarios.