IEEE Access (Jan 2023)
Content Disarm and Reconstruction of PDF Files
Abstract
Content Disarm and Reconstruction (CDR) is a zero-trust file methodology that proactively extracts threat attack vectors from documents and media files. While extensive literature on CDR emphasizes its importance, a detailed discussion of how the CDR process works, its effectiveness, and its drawbacks is not presented. Therefore, this paper presents PdfCDR, the first PDF CDR system in which the validation, the prevention rate, and the received visual similarity effect of disarming and reconstruction are presented and measured. Furthermore, PdfCDR suggests for the first time a novel method dealing with new emerging exploits by automatically converting detection rules to disarm and reconstruction rules. As a result, PdfCDR can prevent evasive attacks without any software upgrades and utilize the cyber security community knowledge to prevent cyber attacks as soon as they are advertised. The effectiveness of the novel PdfCDR against well-known PDF datasets shows that it disarmed not only the malicious components, but the reconstructed file is also usable and functional. However, since CDR relies on understanding the file format, any CDR solution should handle each supported file type separately due to the vast difference in each file format. Hence, this paper focuses on the Portable Document Format (PDF) file type that attackers commonly exploit. The results indicate that PdfCDR successfully CDR 90% of the malicious files while the remaining 10% were encrypted or had abnormal structures compared to the standard and were quarantined.
Keywords