Indonesian Journal of Data and Science (Mar 2025)

Comparative Analysis of Gradient-Based Optimizers in Feedforward Neural Networks for Titanic Survival Prediction

  • I Putu Adi Pratama,
  • Ni Wayan Jeri Kusama Dewi

DOI
https://doi.org/10.56705/ijodas.v6i1.219
Journal volume & issue
Vol. 6, no. 1

Abstract

Read online

The Titanic survival prediction problem has served as a benchmark for testing machine learning algorithms, particularly for binary classification tasks involving tabular data. While numerous models have been applied to this dataset, Feedforward Neural Networks (FNNs), also referred to as Multilayer Perceptrons (MLPs), offer unique advantages due to their ability to approximate complex functions. This study investigates the performance of FNNs for survival prediction using the Titanic dataset, focusing on the impact of gradient-based optimisation algorithms. Eight optimisers—Batch Gradient Descent (BGD), Stochastic Gradient Descent (SGD), Mini-Batch Gradient Descent, Nesterov Accelerated Gradient (NAG), Heavy Ball Method, Adam, RMSprop, and Nadam—were systematically compared across three FNN architectures: small ([64, 32, 16]), medium ([128, 64, 32]), and large ([256, 128, 64]). To enhance stability and generalisation, the models employed binary cross-entropy loss, dropout, L2 regularisation, batch normalisation, and Leaky ReLU activation. A dynamic learning rate scheduler was implemented to optimise training by adjusting the learning rate during each epoch. Models were trained using an 80-20 train-test split over 50 epochs, with performance assessed using metrics such as accuracy, precision, recall, F1 score, and cross-entropy loss. Results showed that Adam achieved the highest accuracy of 82.6% with an F1 score of 0.77 on the medium architecture, demonstrating the best balance between performance and training time. RMSprop and Nadam also delivered competitive results, particularly in terms of precision and generalisation. Smaller architectures were faster to train but showed reduced accuracy, while larger architectures marginally improved performance at the cost of longer training times. The inclusion of a learning rate scheduler further enhanced convergence and reduced overfitting, improving generalisation to unseen data. This study provides a comparative analysis of gradient-based optimisers for FNNs applied to tabular datasets, offering insights into the optimal configurations for balancing accuracy, generalisation, and computational efficiency. These findings contribute to the growing body of knowledge on leveraging neural networks for structured data tasks.

Keywords