Big Data text analysis for BowTies

Guest blog by Prof. Coen van Gulijk from University of Huddersfield

How to extract in-depth learning from 200,000 reports per year? Big data text analysis for BowTies.

Dealing with vast numbers of text-based safety records

One of the challenges for safety managers is dealing with vast amounts of data. Especially when it comes to near-miss reporting the numbers can be vast and reading all of them to gain a systematic overview is impractical. In the GB railways, a near-miss reporting system has been in place for the last five years and the number of reports is staggering: up to 200,000 per year. These reports are individually followed-up as much as possible but apart from a fairly straightforward drop-down risk-classification system it is hard to extract in-depth learning from them: the lessons are captured in free-text fields.

From a learning perspective, it makes sense to classify close call reports against a safety BowTie. BowTies are an effective visual representation that connects hazards, threats, consequences, and most importantly controls. Additionally, in BowTieServer, several functions can be added to BowTies that provide you with relevant information about the quality and status of barriers: AuditXP and IncidentXP are examples of this.

Progress in safety text analysis

At the University of Huddersfield, we are working on adding even more functionality to the BowTie: machine-assisted text analysis. What we do is we take the text-based close call reports, interpret the text with computers and map them against an existing BowTie. We have reported about the basic steps in this approach in our recent publication called: “From free-text to structured safety management: Introduction of a semi-automated classification method of railway hazard reports to elements on a bow-tie diagram”, which was published in Safety Science.

The paper offers an accessible method for which the analysis steps are clearly described. The basic approach for extracting BowTie hazards from unstructured text safety is covered in the paper. The use case is a worker slip-trip-fall BowTie where about 7,000 out of about 200,000 records were found to map against the six hazards in that BowTie. Figure 1 shows a histogram of the occurrence of these hazards in the period of October 2013 till February 2015. Despite challenges in accuracy and precision, which we are addressing in further research, we are confident that automated text analysis will become a standard instrument for safety management support in the future. In fact, other researchers have already suggested that machine-assisted text analysis might become mandatory in the US.

Figure 1: Count of close calls per month assigned to each threat pathway

Human factors in big data

Even though this research is fairly technocratic in its approach human factors come into play immediately. People at work report close calls with cold hands or gloves, misspelling, using synonyms and generally using texting slang. For instance, the word palisade was spelled in at least ten different ways:

  • palisade
  • palasaid
  • palasaide
  • palicade
  • palilsade
  • palisadade
  • paliside
  • palistrade
  • pallasade
  • pallaside

Even if this complicates our work, I find it very satisfying that human factors immediately come into play; it makes me feel that we can really help real people discussing real safety problems. For our methods, it means that we have to teach the software to deal with these variations. As things stand today we think that however smart a text-analysis technique may become, there will always have to be humans monitoring whether the automated selection makes sense within the safety control system.

Another fascinating human factors aspect is that the error category ‘human error’ is very hard to grasp by text analysis. More advanced analytics might solve part of that problem but the fact is that human errors materialize in many different ways, which makes it difficult to isolate from other hazards. Again, we think that human intelligence will be required.

Big data in your BowTie

I believe that text analysis is one of the techniques that allows safety managers to harvest the potential of working with large amounts of data. Our paper provides the basics for a whole new suite of opportunities to improve safety control and BowTie offers an ideal framework for the analysis and integration of multiple data sources in a safety management system.

Learn more about big data text analysis for bowties and download the paper at the website of Safety and Science (free for subscribers):