Product safety intelligence phase one outputs

Matt Clay


Product safety issues have the potential to affect large numbers of people at work and in the community. A single design flaw will be replicated many times and product recalls are costly and rarely recover anything more than a small proportion of defective products. At the same time, media coverage of product safety issues causes immense reputational and economic damage whilst causing significant anxiety amongst users. Recent high-profile cases in the automotive, aviation and built environment sector highlight the issue further. 

Aims and objectives

HSE’s Corporate Operational Intelligence System (COIN) was used for this project. A subset of COIN was created by a data scientist by simply searching for the phrase ‘product safety’ within records. For the relevant records, free text was downloaded for both entries in the database itself and document attachments. For product safety records there is minimal coded information present, so the bulk of any value would need to be extracted from the unstructured text. 

Key findings

It is clear that industry supports the value that could be created from this work. However, the main challenge is that considerable upfront work is required to extract the relevant dataset and only then can the extraction of intelligence begin. Whilst on the basis of small manual sample there appears to be value in the intelligence contained within it, it is only feasible to make a final determination on this once a relatively large number of records have been reviewed. This increases risk and uncertainty which makes ongoing investment in this use case more speculative with a lower potential cost-benefit. 

More reliable coding within the COIN database would reduce the risks and uncertainty associated with attempting a project of this nature. Conversely, developing tools which can reliably extract relevant records from chaotic data is likely to be a valuable exercise on the wider programme, particularly when we wish to augment HSE’s datasets with a wide variety of industry sources since it is unlikely that each source would have the same coded taxonomy. As with other use cases, better retention of data beyond the seven year current policy would be useful. 


At the current time, it is proposed that this work is suspended in favour of accelerating delivery of other use cases which would otherwise require shared resource. 

Continuing this work could potentially unlock great value for industry. However, there is much greater uncertainty around success than the other use cases and it is likely to require investment in the region of £150k simply to get to a point where records can be reliably extracted. It is then possible that the intelligence which could be extracted would be of limited value. In order to maximise success, this project is also likely to require dedicated data science resource together with subject matter expertise which is likely to need to be contracted in via an associate. This is a potentially high return project but with associated high risks.