Web scraping health and safety intelligence

Web scraping
Machine learning
AI

Of the vast amount of information on the Internet, most is plain text. The data necessary to answer numerous questions of a health and safety nature, for example the associations between a work practice and a specific health and safety endpoint may be available online but extracting it and organising it for quantitative analysis is a major challenge. Use of artificial intelligence and machine learning technology in parallel with web scraping tools offers opportunities to address these challenges. In parts of the world where the availability of structured health and safety datasets is more limited, web scraping tools offer opportunities to curate bespoke datasets based on the content of web pages.   

This provides the focus of a Discovering Safety feasibility study. The aim is to investigate the potential of using web scraping tools to help populate the knowledge repository being compiled as part of the wider research programme. Recent technological advances have been made in this area. For example, researchers from the Massachusetts 

Institute of Technology recently released a paper on an artificial intelligence system that can extract information from sources on the web and learn how to do it independently. This feasibility study will look to make use of similar cutting-edge technology to identify key safety related content of value to the research programme.  

The sorts of content of interest include:  

  • Serious accidents happening around the world 
  • Current national safety concerns and challenges
  • Technological advancements in risk control 
  • Regulatory changes and safety interventions working