Tongxin xuebao (Jan 2009)

HitIct:Chinese corpus for the evaluation of lossless compression algorithms

  • CHANG Wei-ling1,
  • YUN Xiao-chun2,
  • FANG Bin-xing1,
  • WANG Shu-peng2

Journal volume & issue
Vol. 30
pp. 42 – 47

Abstract

Read online

HitIct, a Chinese corpus for the evaluation of lossless compression algorithms based on ANSI code, was proposed.In accordance with the principle of application representativeness, Complementary principle and openness principle, a large number of candidate files were obtained from the Internet, and then average compression ratio, average correlation coefficient, compression ratio correlation coefficient and standard deviation were used to select the files that give the most accurate indication of the overall performance of compression algorithms.Experimental results show that this collection has a good representativeness and stability, and can be used as the supplementary test set of the main benchmark for comparing compression methods.

Keywords