Results in Engineering (Mar 2025)

Generalized cross-entropy for learning from crowds based on correlated chained Gaussian processes

  • J. Gil-González,
  • G. Daza-Santacoloma,
  • D. Cárdenas-Peña,
  • A. Orozco-Gutiérrez,
  • A. Álvarez-Meza

Journal volume & issue
Vol. 25
p. 103863

Abstract

Read online

Machine learning applications heavily depend on labeled data provided by domain experts to train accurate models. However, the cost and time constraints associated with expert labeling often make obtaining ground truth labels impractical. Crowdsourcing offers a cost-effective alternative for collecting annotations but introduces challenges such as varying label quality and noisy data. To address these issues, the field of learning from crowds has emerged, enabling supervised learning directly from crowdsourced data. Still, traditional approaches often fall short by assuming homogeneous behavior across the input feature space and imposing independence constraints on outputs. We introduced the correlated chained Gaussian process with enhanced generalized cross-entropy loss, termed CGP-GCE, which effectively models multiple annotators' non-stationary and interdependent behaviors by a Bayesian approach. Besides, our proposal achieves a suitable trade-off between mean absolute error and cross-entropy function, significantly mitigating the impact of noisy labels and enhancing the robustness and accuracy of classification tasks in crowdsourced data environments. Through extensive simulated and real-world experiments, CCGP-GCE demonstrated superior classification performance, outperforming state-of-the-art multilabeler models in terms of both accuracy and expert reliability estimation. We aim to extend CCGP-GCE for future work to handle sparse and imbalanced annotations. Additionally, we plan to apply this model to multimodal tasks.

Keywords