Scientific Data (Mar 2025)
A large-scale open image dataset for deep learning-enabled intelligent sorting and analyzing of raw coal
Abstract
Abstract Under the strategic objectives of carbon peaking and carbon neutrality, energy transition driven by new quality productive forces has emerged as a central theme in China’s energy development. Among these, the intelligent sorting and analysis of raw coal using deep learning constitute a pivotal technical process. However, the progress of intelligent coal preparation in China has been constrained by the absence of accurate and large-scale data. To address this gap, this study introduces DsCGF, a large-scale, open-source raw coal image dataset. Over the past five years, extensive raw coal image samples were systematically collected and meticulously annotated from three representative mining regions in China, resulting in a dataset comprising over 270,000 visible-light images. These images are annotated at multiple levels, targeting three primary categories: coal, gangue, and foreign objects, and are designed for three core computer vision tasks: image classification, object detection, and instance segmentation. Comprehensive evaluation results indicate that the DsCGF can effectively support further research into the intelligent sorting of raw coal.