Making Sense of Safety Data – Our Data Warehouse Project

Discovering Safety is developing a data warehouse that will store, collate and make sense of the vast quantity of information on safety that is gathered in workplaces every year. Under development by Health and Safety Executive (HSE) data scientists in collaboration with industry partners, the data warehouse will be a key element of Discovering Safety’s mission to use the latest technological advances to bring about a safer world. Once in operation, it will lead the way towards more effective, better-informed interventions to prevent people from being harmed at work.

Using AI to unlock safety insights

By bringing vast amounts of safety data together and then applying artificial intelligence (AI) tools to read it, the warehouse will greatly enhance our ability to apply learning to prevent the accidents of the future. The information stored and processed will begin with the 70,000-odd reports sent to HSE every year under the Reporting of Incidents, Diseases and Dangerous Occurrences Regulations (RIDDOR), about 130 of which involve fatalities.

“We are using the data warehouse to process the accident reports to work out the activities being undertaken and the risks they posed, the mitigations that were in place and those that should have been in place. We can then use this learning to prevent similar accidents from happening in the future,” explains Joseph Januszewski, an HSE data scientist coordinating the team who are developing the data warehouse.

“This is information we already have, but its sheer scale along with the fact that it’s free text information, not codified and in a range of formats, means we have never yet been able to unlock its full potential.”

The warehouse will also feature information gathered proactively, either by HSE inspectors during regular inspections of workplaces, or by employers’ own investigations carried out to check compliance with the Health and Safety at Work Act. These can pinpoint dangerous conditions on site which, if not corrected, could lead to an accident. Information collected can also be used to select areas for proactive action, based on past results.

Tackling challenges to unlock potential

As well as collecting the data in one place, the scientists developing the warehouse are tackling the challenges that currently prevent it from being usable. They are using artificial intelligence tools, still in development, to allow data in a wide variety of formats (often PDF), free-text and containing sensitive information, to be codified and searched. One of the big challenges that AI tools will help to address is anonymising the data at speed and in large volumes - essential to making the data legal to search and use.

Preventing accidents in the design phase

Information from the data warehouse is being used elsewhere within Discovering Safety, including as part of the Construction Risk Library. Faster access to information in a usable form will enhance the construction industry’s ability to prevent accidents in the design phase. Designers will be able to view an element of the design to be built alongside the activities involved, the hazards they present and the incidents and near misses they have led to in the past. The information they see will include observations from inspections as well as hard data such as statistics, searchable by type of activity.

Exploring nuanced information

Data warehousing will not only allow easier access to information in a searchable form, but deeper exploration of the more nuanced information in documents such as RIDDOR reports.

“The really interesting information is not easy to codify as it doesn’t come in a ‘tick box’ format,” explains Joseph. “Investigation reports contain much more information about why an incident happened and how it could have been prevented. They encapsulate the learning and training of highly experienced inspectors, so have a great deal of learning to offer.”

Collaborative risk management

The data warehousing project will help HSE inspectors improve and develop the way they work, by acting as a collaborative tool to help people learn lessons and make strategic decisions.

“We will be able to extract information in a codified way from any type of inspection, proactive or reactive, to present to our inspectors. This will improve both how we act as a regulator and how industry manages risks,” added Joseph.

Being able to extract codified information from inspection reports will enable HSE to be more proactive and help its inspectors be more effective. It will enhance the depth and effectiveness of campaigns around areas of risk such as respiratory health in construction.

“We measure the impact of the campaigns but this is not an exact science and is labour-intensive,” says Joseph. “The data warehouse will give us hard evidence of the risk profile of an activity, and what mitigations really work in preventing accidents from happening. This will enable us to give companies much more directly useful information to help them manage risks.

Sharing sensitive safety data

The information from the data warehouse will be a collaborative resource that helps everyone involved in safety learn lessons and make the right strategic decisions. To develop the data warehouse further, Discovering Safety will be asking industry to share their datasets, particularly risk assessments, method statements and records of near misses. This will allow us to assess how many near misses happen before an accident.

“For companies to feel comfortable in sharing this information, we will need to ensure we can anonymise it and share it securely,” added Joseph.

Industry partners

Over the next year, we will be providing our information in the data warehouse as a ‘proof of concept’ once we have successfully anonymised it. This will be provided under a contractual agreement to a subset of users, who will be able to use it to develop their own products and systems. As part of this arrangement, Discovering Safety will provide an API service so users can query both our data and their own.

“We will be engaging with industry in the coming months to find out how they would use the data warehouse and gain most value from it. We will then develop the warehouse further based on their opinions,” said Joseph.

Discovering Safety would like to hear from anyone in industry who is interested in finding out more or in helping us develop the data warehouse by giving us their perspective or sharing data.


About Knowledge Graphs

Knowledge graphs are a clear way of presenting data from data warehouse searches. Based on an innovation pioneered by Google, they are the foundation of many advanced AI applications. A knowledge graph shows a network of related concepts, providing a system to link information across documents and encapsulating the ‘tacit knowledge’ of subject matter experts.

Our data warehousing project has begun work on a health and safety knowledge graph with Enterprise Knowledge, which we hope to develop in collaboration with industry. This will use a ‘recommender’ system which works by taking a building practice and instantly returning all related guidance for mitigating associated risks – avoiding the need to search through large guidance documents.

“We’re looking for industry partners to help us decide what domains to include and how the knowledge graph can become a really useful tool,” said Caleb Williamson, a data scientist who has been exploring the potential of knowledge graphs to unlock insights from the data gathered in the warehouse.

A knowledge graph could show which pieces of guidance (if they had been followed) would have prevented the greatest number of accidents for a given work scenario, enabling HSE to design an intervention to encourage greater uptake or to update the guidance. It could also highlight types of accidents that are not currently covered by guidance, so new guidance could be written.

“This will allow HSE to advocate for mitigation measures to be designed into projects at a much earlier stage, so that common and risky activities such as manual handling are eliminated,” Caleb added.