Can censoring of research outputs be automated to ensure robust data protection?

Michael Nicholas; Chris Davies; Kelly Nock; Kerry Bailey; Craig Barker; Luke Player; Helen Thomas

doi:10.23889/ijpds.v1i1.282

International Journal of Population Data Science (Apr 2017)

Can censoring of research outputs be automated to ensure robust data protection?

Michael Nicholas,
Chris Davies,
Kelly Nock,
Kerry Bailey,
Craig Barker,
Luke Player,
Helen Thomas

Affiliations

Michael Nicholas: We Predict
Chris Davies: We Predict
Kelly Nock: We Predict
Kerry Bailey
Craig Barker: Abertawe Bro Morgannwg University Health Board
Luke Player: Swansea University
Helen Thomas: Abertawe Bro Morgannwg University Health Board

DOI: https://doi.org/10.23889/ijpds.v1i1.282
Journal volume & issue: Vol. 1, no. 1

Abstract

Read online

ABSTRACT Background Guidance regarding research outputs recommends censoring so that even when aggregating anonymised linked data no cell should have less than 5 -10 units. This is recommended to decrease the likelihood of re- identification. Leaving those cells empty is not adequate if other cells can be used to identify the numerical value of that cell. Some outputs necessitate a large number of tables to be exported this will become more common. This was the case where the outputs from a research study involved several large tables which drove a front end interactive visualisation. As linked data outputs are used to make operational decisions which necessitates timely data outputs of large amount of aggregated data this issue will be more common. Human scanning of all tables may not be time or cost effective and can be subject to human error. Approach Many methods of censoring were considered including Barnardisation (adding or subtracting 1 randomly to small numbers) suppression and a combination of methods. It was then necessary to code the methods to ensure that censoring was implemented in all cells in the output and that the output was still meaningful. It was then necessary to check the outputs for quality and introduce an ‘audit’ system to ensure that the quality was maintained but did not impact on the outputs of the findings. Discussion Software engineers were able to develop an algorithm that performed safe censoring using a level of ‘10 or under’. It also ensured that the statistical tables were still functional. The presentation will describe how this was done and demonstrate some examples of the impact on the output. Some stakeholders felt that the censoring of the anonymised aggregated data went beyond the ‘reasonable effort’ required to re- identify individuals. Some expressed the opinion that the lack of detail and missing data that this method results in is excessive and has been sacrificed for the sake of minimal risk. Some stakeholders felt the risks had been allowed to outweigh the societal benefits. The team were assured that although the censoring may be considered excessive by some it did ensure safe censoring and offered as low a risk a possible for re-identification. However routine implementation of this method has not been agreed.

Published in International Journal of Population Data Science

ISSN: 2399-4908 (Online)
Publisher: Swansea University
Country of publisher: United Kingdom
LCC subjects: Social Sciences: Economic theory. Demography: Demography. Population. Vital events
Website: https://ijpds.org

About the journal