Digital Communications and Networks (Oct 2023)

Entropy-based redundancy analysis and information screening

  • Yang Li,
  • Jiachen Yang,
  • Jiabao Wen

Journal volume & issue
Vol. 9, no. 5
pp. 1061 – 1069

Abstract

Read online

The ongoing data explosion introduced unprecedented challenges to the information security of communication networks. As images are one of the most commonly used information transmission carriers; therefore, their data redundancy analysis and screening are of great significance. However, most of the current research focus on the algorithm improvement of commonly used image datasets. Thus, we should consider an important question: Is there data redundancy in the open datasets? Considering the factors of model structures and data distribution to ensure the generalization, we conducted extensive experiments to compare the average accuracy based on few random data to the baseline accuracy based on all data. The results show serious data redundancy in the open datasets from different domains. For instance, with the aid of deep model, only 20% data can achieve more than 90% of the baseline accuracy. Further, we proposed a novel entropy-based information screening method, which outperforms the random sampling under many experimental conditions. In particular, considering 20% of data, for the shallow model, the improvement is approximately 10%, and for the deep model, the ratio to the baseline accuracy increases to greater than 95%. Moreover, this work can also serve as a new way of learning from a few valuable samples, compressing the size of existing datasets and guiding the construction of high-quality datasets in the future.

Keywords