The Astrophysical Journal (Jan 2024)
Participatory Science and Machine Learning Applied to Millions of Sources in the Hobby–Eberly Telescope Dark Energy Experiment
Abstract
We are merging a large participatory science effort with machine learning to enhance the Hobby–Eberly Telescope Dark Energy Experiment (HETDEX). Our overall goal is to remove false positives, allowing us to use lower signal-to-noise data and sources with low goodness-of-fit. With six million classifications through Dark Energy Explorers, we can confidently determine if a source is not real at over 94% confidence level when classified by at least 10 individuals; this confidence level increases for higher signal-to-noise sources. To date, we have only been able to apply this direct analysis to 190,000 sources. The full sample of HETDEX will contain around 2–3 million sources, including nearby galaxies ([O ii ] emitters), distant galaxies (Ly α emitters or LAEs), false positives, and contamination from instrument issues. We can accommodate this tenfold increase by using machine learning with visually vetted samples from Dark Energy Explorers. We have already increased by over tenfold the number of sources that have been visually vetted from our previous pilot study where we only had 14,000 visually vetted LAE candidates. This paper expands on the previous work by increasing the visually vetted sample from 14,000 to 190,000. In addition, using our currently visually vetted sample, we generate a real or false positive classification for the full candidate sample of 1.2 million LAEs. We currently have approximately 17,000 volunteers from 159 countries around the world. Thus, we are applying participatory or citizen scientist analysis to our full HETDEX data set, creating a free educational opportunity that requires no prior technical knowledge.
Keywords