Ohalo – automatic anonymisation of safety data

Discovering Safety’s collaboration with data governance specialists Ohalo is making it possible to anonymise vast quantities of safety data in days rather than years, allowing organisations to learn where things have gone wrong and use that information to avoid accidents and ill health in the future. We’re using Ohalo’s Data X-ray to automatically remove personal and sensitive details from safety documents, so they can be shared without compromising confidentiality.

Ohalo are experts in data governance. The relationship started when Ohalo won our ‘Safety Accelerator’ challenge aimed at combining the best brains in tech startups with the wealth of safety knowledge in HSE.

Steve Naylor, HSE data specialist, said: “Back in 2019, the HSE put in an industrial challenge to Lloyd’s Register’s Safety Accelerator centred around the challenges of auto-redacting large volumes of GDPR sensitive research datasets. The principal driver was the aspiration to promote sharing of health and safety data to stimulate innovation - and the major barrier posed by the risk of breaching data protection legislation. Ohalo won the challenge with their pitch to use their proprietary software platform, Data X-Ray, as a starting point. We ran a pilot project later in 2019 which successfully demonstrated proof of concept.”

Until now, redacting personal details from documents has meant a human being reading through and manually blacking out information. This is a laborious task and hard to do accurately – so it often takes three people in order to be sure that no sensitive information is left in.

The Data X-ray is a software tool that ingests hundreds of thousands of words in an instant, redacting them with high levels of accuracy. Having won the Safety Accelerator challenge, Ohalo used it to tackle HSE’s massive data redaction task - anonymising its 600,000 RIDDOR reports, which would take three people over ten years to redact manually. Ohalo’s Data X-Ray reduced that to one machine day. The result was 4,500x less time and 49x less cost.

Kyle Dupont, Ohalo’s CEO, explains why automating anonymisation holds such potential for our ability to learn from things that have happened: “Unfortunately, most safety records contain sensitive and personal data, making sharing it problematic. But the amount of data gathered is growing exponentially, and manually redacting hundreds of thousands of safety documents is just not feasible.”

Ohalo set up a Data X-Ray server in HSE’s environment to analyse the RIDDOR reports. The server ingested the data to identify sensitive personal data, and redacted it. HSE data scientists worked with Ohalo to identify any sensitive information that had not been redacted.

When the HSE team spotted any errors such as false positives, the Data X- Ray enabled them to update the models with example training information for a better result next time.

To generate insight that leads to better safety, we need to work from large datasets from as many organisations as possible. Our work with Ohalo is providing innovative, effective, automated techniques for desensitising and anonymising health and safety information. This will vastly increase the amount of information that can be shared and learned from, preventing harm and saving lives.

Building on the success with the HSE, Ohalo has now deployed the same safety data anonymisation use case with their partnership with the Wood Group. The engineering firm was collating hundreds of thousands of safety data records from seven construction firms. Wood used Ohalo’s Data X-ray to redact the records automatically, so they could be securely shared and used to learn lessons from. Ohalo have since worked with other companies including Costain to anonymise their safety data.

“We were first introduced to the HSE when we won the Lloyd’s Register Safety Accelerator challenge sponsored by Plug and Play out of Silicon Valley to redact HSE’s RIDDOR documents in 2019,” says Kyle. “Our relationship has flourished since then, and we’re now combining our technical capability with HSE’s health and safety expertise to develop anonymisation tools that will open up a world of data for governments and industry partners to use.”

“Our growing relationship with HSE is allowing us to improve the tool’s accuracy and ability to instantly redact safety documents.”

The use of Ohalo’s tools is key to the mission of Discovering Safety, to bring the power of data engineering to bear on the vast and growing body of safety information. It’s key to our ability to identify accident trends, save lives, and send workers home safely to their families every day.