Universe (Sep 2022)
A Preliminary Study of Large Scale Pulsar Candidate Sifting Based on Parallel Hybrid Clustering
Abstract
Pulsar candidate sifting is an essential part of pulsar analysis pipelines for discovering new pulsars. To solve the problem of data mining of a large number of pulsar data using a Five-hundred-meter Aperture Spherical radio Telescope (FAST), a parallel pulsar candidate sifting algorithm based on semi-supervised clustering is proposed, which adopts a hybrid clustering scheme based on density hierarchy and the partition method, combined with a Spark-based parallel model and a sliding window-based partition strategy. Experiments on the two datasets, HTRU (The High Time-Resolution Universe Survey) 2 and AOD-FAST (Actual Observation Data from FAST), show that the algorithm can excellently identify the pulsars with high performance: On HTRU2, the Precision and Recall rates are 0.946 and 0.905, and those on AOD-FAST are 0.787 and 0.994, respectively; the running time on both datasets is also significantly reduced compared with its serial execution mode. It can be concluded that the proposed algorithm provides a feasible idea for astronomical data mining of FAST observation.
Keywords