IEEE Access (Jan 2024)

StegAbb: A Cover-Generating Text Steganographic Tool Using GPT-3 Language Modeling for Covert Communication Across SDRs

  • M. V. Namitha,
  • G. R. Manjula,
  • Manjula C. Belavagi

DOI
https://doi.org/10.1109/ACCESS.2024.3411288
Journal volume & issue
Vol. 12
pp. 82057 – 82067

Abstract

Read online

The textual data unlike images, videos, and audio will not get much distorted while transmitting across the network. Steganography is the process of hiding secret data inside innocent data for secured communication. Software-defined radios are promising as they transmit over air. The approach proposed is performing linguistic steganography and transmitting the stego text from source to destination through software-defined radio. The linguistic steganography is achieved through the Generative Pre-trained Transformer 3 language model. Initially, English alphabets are coded using binary uniquely, followed by generation of scrambling sequence by a pre-shared key. The secret bits and scrambling bits are XORed, and the resultant bit stream will be reversed to ensure additional security. Both sender and receiver should agree upon binary coding of English alphabets. Mapping of these alphabets to binary scrambled sequence is done after grouping it into 4 bits. In the next step secret bit stream is converted as an abbreviation. stego text is obtained by generating the full form for the obtained abbreviation in the previous step using the Generative Pre-trained Transformer 3 language model. The perplexity and human opinion scores are considered to quantify the quality of the generated text. The method demonstrates promising performance metrics, including an average perplexity as low as 2.75, an average human opinion score of 4.052, and an average embedding rate of approximately 0.72 seconds along with 4 bits/words embedding capacity. The generated text exhibits low recognition as steganography, with an accuracy value of 0.5245, while achieving 100% decoding accuracy.

Keywords