Web scraping health and safety intelligence project

25/06/19

Of the vast amount of information on the Internet, most is plain text. The data necessary to answer numerous questions of a health and safety nature, for example the associations between a work practice and a specific health and safety endpoint may be available online but extracting it and organising it for quantitative analysis is a major challenge. Use of artificial intelligence and machine learning technology in parallel with web scraping tools offers opportunities to address these challenges. In parts of the world where the availability of structured health and safety datasets is more limited, web scraping tools offer opportunities to curate bespoke datasets based on the content of web pages.

This provides the focus of a Discovering Safety feasibility study. The aim is to investigate the potential of using web scraping tools to help populate the knowledge repository being compiled as part of the wider research programme. Recent technological advances have been made in this area. For example, researchers from the Massachusetts

Institute of Technology recently released a paper on an artificial intelligence system that can extract information from sources on the web and learn how to do it independently. This feasibility study will look to make use of similar cutting-edge technology to identify key safety related content of value to the research programme.

The sorts of content of interest include:

Serious accidents happening around the world
Current national safety concerns and challenges
Technological advancements in risk control
Regulatory changes and safety interventions working

More Information

Web scraping health and safety intelligence phase one report

Related Content

Construction Risk Library project

The Construction Risk Library is designed to assist with the identification and treatment of construction health and safety risks. It has been created as a useful aid for designers to help them meet their obligations under the Construction (Design and Management) Regulations 2015, commonly known as CDM. The Library currently consists of a suite of […]

Leading Indicators

We’re working with the University of Manchester and the construction industry to provide the evidence base for using ‘leading indicators’ in preventing harm at work. Leading indicators are the positive steps – like training and communications – that can help prevent harm from happening. The health and safety community has long believed that leading indicators […]

Loss of containment insights project

We are working with the processing industry to help develop tools that will help us find the factors that lead to Loss of Containment (LoC) incidents. These happen when hazardous substances (such as gas, fuel and chemicals) escape from storage, sometimes leading to catastrophes such as: We need to learn more about the causes of […]

Innovations – Predicting risk project

We’ve been working with partners in construction and artificial intelligence to look at whether using analytic tools could help us to predict risk. By linking the power of artificial intelligence with the mass of safety data held by industry, we’ve uncovered insights and patterns that will save lives on construction sites and in other workplaces. […]

Product safety intelligence project

Mass manufactured products are used by both consumers (public) and industrial users for work activities. Latent defects introduced at the design or production phase are pernicious since many users can be adversely affected by a single fault. If these issues result in ‘safe’ failures, then user operability issues arise. However, ‘dangerous’ failures can result in […]

Automatic data anonymisation

We’ve been working with Ohalo, a leading company in the field of data governance, to provide an automatic tool for anonymising safety data. We’re working together to overcome one of the biggest challenges we face in sharing insights from past accidents and incidents to prevent people from being hurt at work. Our relationship with Ohalo […]