Development of capacities to extract health and safety insights from free-text sources

Machine learning
Text mining
Data
Insight
Working together

As recent developments in text mining, natural language processing and machine learning are becoming more mainstream, the task of generating data driven insights and learning from health and safety operational intelligence, particularly from free text, is now far easier than it was before.  

Therefore, rather than reviews of operational intelligence being based on small samples of available data and largely manual and qualitative exercises, entire corpuses of information can now be analysed quickly and with increasingly fine granularity. Interrogating health and safety operational intelligence, including that held in free text formats with the help of text mining and natural language processing tools, has the potential to yield a range of different types of learning for organisations beyond that generated using traditional approaches used by organisations.  

For example, rather than routine datasets merely being used largely for operational reporting purposes, essentially to profile what has gone wrong from a health and safety perspective and where, when and to whom, use of text mining opens up opportunities to address other key health questions in an automated way, such as how and why specific health and safety failures happened and to discover new and emerging problems in workplaces and even to predict future failure events. 

The text mining project being delivered as part of Discovering Safety will use existing state of the art in text mining and natural language processing as a start point and develop analytic tools and techniques for specific use in health and safety contexts.

Aims and Objectives

Planned work will look to capitalise on how other industries and disciplines, for example the health, medical and pharmaceutical sectors, are using data science and data analytics to leverage value from their free-text data sources. Work will look to test the extent to which such existing text mining approaches might be useful for generating insights from routine health and safety data. HSE’s rich and varied archive of Health and Safety data, accrued year on year from it workplace inspection, incident investigation and enforcement activities, along with the incident information reported to HSE by its duty holders, provides a ready made research dataset to support the creation of an ecosystem for automatically generating insights from routine health and safety operational intelligence.

Key work tasks:

  • Create a labelled dataset of health and safety op intel to train algorithms on
  • Build algorithms to retrieve and extract knowledge from free-text sources
  • Use algorithms to create a comprehensive health and safety knowledge base
  • Interrogate knowledge base to generate health and safety learning e.g. to support root cause analysis exercise, identification of emerging risks, HAZOP exercises etc

Key benefits

Use of insights extracted from free text operational intelligence to predict the occurrence of future health and safety accidents.  

Once health and safety free text has been effectively “encoded”, its content can then be used in more quantitative, inferential type statistical exercises.  

For example, associations between specific health and safety endpoints of interest (e.g. accidents, injuries, loss of control events) and potential precursor events can be investigated. Such understanding then opens up opportunities to predict the occurrence of future events in workplaces.

One approach to this is using statistical machine learning techniques to develop algorithms that predict, by way of risk scores, the likelihood of future failures in health and safety, based on past performance and supporting contextual information, in much the same way that credit scores predict the likelihood of future payment defaults.

Such insights can then be used to form the basis of subsequent decision-making, for example, when and what to inspect and to target prevention efforts.