The effectiveness of the Ohalo Data X-Ray anonymisation has been evaluated in comparison to manual anonymisation. The evaluation made use of RIDDOR data used for the HSE Construction Division RIDDOR dashboard1, which was manually anonymised in 2017 and made public. The standard of assessment used to define a significant data breach was that there will likely be a risk to people’s rights and freedoms in the event that HSE data is shared externally. This is the same as that used by the UK Information Commissioners Office (ICO)2.
From the 1998 RIDDOR reports analysed, 743 contained sensitive text, including some personally identifiable information (PII). After anonymisation using Ohalo’s Data X-Ray, 94 records retained some personally identifiable information, of which 19 would be considered sufficient for a significant breach. Based on these figures, Ohalo’s Data X-Ray was able to reduce the number of sensitive records by 97% (i.e. 724/743 were adequately anonymised, 19/743 remained sensitive). For the 3% of records that remained sensitive, it was generally because information on a named individual along with details of a specific injury or other event still remained after anonymisation.
Aims and objectives
Ohalo Data X-Ray is a server-based, customizable tool for anonymising data, however it has not been used in a Health and Safety context previously and some types of data that we would wish to anonymise were not originally recognised. Additionally, the methods of anonymisation meant that the text entity association was not retained as part of the anonymisation process (e.g. ‘Date’, ‘Person’, ‘Organisation’). This project was to evaluate and improve the anonymisation of health and safety data and add the capability for context specific anonymisation and entity association to improve its use as part of an integrated research desensitization pipeline.
Due to the complex and evolving methods that can be used to identify individuals from their data, it is not anticipated that complete anonymisation will be possible in all cases. However, the reduction in risk provided by this anonymisation evaluation methodology allows that risk to be properly understood so that suitable controls can be put in place to manage residual risk.