IEEE Access (Jan 2021)

A Deep Learning Model for Information Loss Prevention From Multi-Page Digital Documents

  • Abhijit Guha,
  • Debabrata Samanta,
  • Amit Banerjee,
  • Daksh Agarwal

DOI
https://doi.org/10.1109/ACCESS.2021.3084841
Journal volume & issue
Vol. 9
pp. 80451 – 80465

Abstract

Read online

World Wide Web has redefined almost all the business models in the past twenty-five to thirty years. IoT, Big Data, AI are some of the comparatively recent technologies which brought in a revolution in the digitization and management of data. Along with the revolution arose the need for data security and consumer privacy protection, primarily concerning financial institutions. The data breach of Equifax in 2017 and personal information leaks from Facebook in 2021 led to general skepticism among the customers of large corporations. The GLBA, 1999, also known as the Financial Modernization Act, was implemented by US federal law to enforce the financial institutions to protect their private information. Built upon the GLBA, guidelines are paved by FTC for all financial institutions of the United States of America, including TI companies. In this paper, an ANN-based content classification technique using MLP architecture in combination with n-gram TF-IDF feature descriptor is proposed to detect and protect the customers’ sensitive information of a reputed TI company securing it’s one of the digital image-document stores. The proposed technique is compared with other state-of-the-art strategies. Data samples from the digital document store of the company have been taken into consideration in the study, and the prediction accuracy metrics obtained are found to be substantially better and within the acceptable range defined by the organization’s information security monitoring team.

Keywords