Training from Zero: Forecasting of Radio Frequency Machine Learning Data Quantity

William H. Clark; Alan J. Michaels

doi:10.3390/telecom5030032

Telecom (Jul 2024)

Training from Zero: Forecasting of Radio Frequency Machine Learning Data Quantity

William H. Clark,
Alan J. Michaels

Affiliations

William H. Clark: Virginia Tech National Security Institute, Blacksburg, VA 24060, USA
Alan J. Michaels: Virginia Tech National Security Institute, Blacksburg, VA 24060, USA

DOI: https://doi.org/10.3390/telecom5030032
Journal volume & issue: Vol. 5, no. 3
pp. 632 – 651

Abstract

Read online

The data used during training in any given application space are directly tied to the performance of the system once deployed. While there are many other factors that are attributed to producing high-performance models based on the Neural Scaling Law within Machine Learning, there is no doubt that the data used to train a system provide the foundation from which to build. One of the underlying heuristics used within the Machine Learning space is that having more data leads to better models, but there is no easy answer to the question, “How much data is needed to achieve the desired level of performance?” This work examines a modulation classification problem in the Radio Frequency domain space, attempting to answer the question of how many training data are required to achieve a desired level of performance, but the procedure readily applies to classification problems across modalities. The ultimate goal is to determine an approach that requires the lowest amount of data collection to better inform a more thorough collection effort to achieve the desired performance metric. By focusing on forecasting the performance of the model rather than the loss value, this approach allows for a greater intuitive understanding of data volume requirements. While this approach will require an initial dataset, the goal is to allow for the initial data collection to be orders of magnitude smaller than what is required for delivering a system that achieves the desired performance. An additional benefit of the techniques presented here is that the quality of different datasets can be numerically evaluated and tied together with the quantity of data, and ultimately, the performance of the architecture in the problem domain.

Published in Telecom

ISSN: 2673-4001 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/telecom

About the journal

Abstract

Keywords