Scientific Data (Mar 2025)

A large-scale open image dataset for deep learning-enabled intelligent sorting and analyzing of raw coal

  • Ziqi Lv,
  • Yuhan Fan,
  • Te Sha,
  • Yao Cui,
  • Yuxin Wu,
  • Haimei Lv,
  • Meijie Sun,
  • Yanan Tu,
  • Zhiqiang Xu,
  • Weidong Wang

DOI
https://doi.org/10.1038/s41597-025-04719-0
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Under the strategic objectives of carbon peaking and carbon neutrality, energy transition driven by new quality productive forces has emerged as a central theme in China’s energy development. Among these, the intelligent sorting and analysis of raw coal using deep learning constitute a pivotal technical process. However, the progress of intelligent coal preparation in China has been constrained by the absence of accurate and large-scale data. To address this gap, this study introduces DsCGF, a large-scale, open-source raw coal image dataset. Over the past five years, extensive raw coal image samples were systematically collected and meticulously annotated from three representative mining regions in China, resulting in a dataset comprising over 270,000 visible-light images. These images are annotated at multiple levels, targeting three primary categories: coal, gangue, and foreign objects, and are designed for three core computer vision tasks: image classification, object detection, and instance segmentation. Comprehensive evaluation results indicate that the DsCGF can effectively support further research into the intelligent sorting of raw coal.