BIO Web of Conferences (Jan 2024)

Exploring Text Data Compression: A Comparative Study of Adaptive Huffman and LZW Approaches

  • Kadhim Doaa J.,
  • Mosleh Mahmood F.,
  • Abed Faeza A.

DOI
https://doi.org/10.1051/bioconf/20249700035
Journal volume & issue
Vol. 97
p. 00035

Abstract

Read online

Data compression is a critical procedure in computer science that aims to minimize the size of data files while maintaining their vital information. It is extensively utilized in Numerous applications, including communication, data storage, and multimedia transmission. In this work, we investigated the results of compressing four different text files with Lempel-Ziv-Welch compression techniques and Adaptive Huffman coding. The experiment used four text files: Arabic and English paragraphs and repeated Arabic and English characters. We measured Bit-rate, Compression Time, and Decompression Time to evaluate the algorithms' performance. With a compression time of around 22 μsec/char, the results demonstrated that the Adaptive Huffman algorithm was quicker at compressing Arabic and English text files. On the other hand, the decompression time for the LZW technique was 23 μsec/char, which was quicker. The Adaptive Huffman algorithm outperforms the LZW with a Bit rate of about 1.25 bits per character for Arabic text. The English-formatted encoded text's Bit rate in Adaptive Huffman was 4.495 bit/char, lower than LZW's Bit rates of 3.363 and 6.824 bit/char for the Arabic and English texts, respectively. When it came to texts containing Arabic and English characters, the LZW algorithm outperformed the Adaptive Huffman algorithm in terms of decompression time and Bit-rate. The decompression time for a text with Arabic letters was 6 μsec/char, and the Bit-rate was 0.717 bits/char. These values were lower compared to the text with English letters, which had a decompression time of 16 μsec/char and a Bit-rate of 1.694 bit/char. For compression time Adaptive Huffman outperform LZW and achieve 15 μsec/char, and 47 μsec/char for both Arabic and English letters files respectively.