Article 11 in this series about Control Based Risk Management and Critical Control Management (CCM) will discuss a vital part of effective CCM, verification.
This article has been the most difficult to write. There is little agreement across mining companies about verification methods, generally aligned with their varying objectives for adopting some form of CCM.
Outcomes CCM process
A quality CCM process analytical steps should provide the following outcomes:
- A list of priority or material unwanted events (PUE)
- An image of the overall control strategy (all important controls) for managing the PUE risk, considering erosion factors and supporting activities
- A carefully selected and challenged list of specific PUE critical controls that are crucial, measurable and, ideally, indicative of PUE risk
- A verification process that captures timely data concerning critical control effectiveness and, ideally, provides an indication of any changes in PUE risk.
The verification output of CCM is the most significant enhancement of previous good practice risk management. As suggested by one mining company, verification greatly improves the management in risk management.
Defining verification and its importance
Verification is defined in the ICMM guide as “the process of checking the extent to which the performance requirements set for a critical control are being met in practice.” This means that the verification process should align with the defined critical control (CC) performance requirements (see article 10 of this series on performance requirements).
Verification should be a unique CCM term. To identify the degree to which a critical control is meeting its performance requirements (i.e. its’ effectiveness), it is necessary to define a process that gathers data from multiple and possibly diverse sources ranging from direct observation to systems review, and other sources. It should not be confused with monitoring or auditing.
It’s important to highlight the potential value of CCM verification as a timely indicator of changing risk. In an adequately mature company or site, CCM verification can more effectively ‘keep tabs’ on the risk of high consequence events, going well beyond just focusing the workforces’ attention on a ‘few critical controls’.
Differences between industry opinions and practices
The sources of verification data, the frequency of data gathering, and the quantification of data to establish effectiveness are the most common areas where differences exist between industry opinions and practices. To discuss these differences, it may be helpful to continue using examples of companies or sites at various points in the CCM maturity journey.
In article 8 the variation in company or site CCM objectives was discussed. “Experience indicates that objectives, defined and otherwise, for a CCM initiative vary greatly, leading to very different CCM outcomes.” Three examples were given. These examples can also be used to illustrate levels of risk management maturity (as illustrated below) that can be reflected in their verification process design.
A company or site may decide to use the CCM process to select the critical workforce acts to prevent site PUEs. Thereby, using the process to define ‘golden rules’.
Example 1 illustrates a focus on human error to reduce PUEs that are usually single fatality issues. Verification might involve gathering of data from Task Based Observation (TBO) or similar initiatives that have been focussed on the new ‘golden rules’. As such, though data observation quality is usually limited, the company or site may feel that a regular review of TBO results is adequate to understand whether the PUE risk is acceptable. This approach adopts some ideas to improved control focus but is not a quality CCM approach.
The company or site decides to select the critical controls for PUEs that are identified as the most crucial by a cross section of site personnel and experts. The Objective is to manage these selected controls with a CCM approach, so the risk is reduced.
The focus in example 2 is more mature than example 1. The company or site is looking beyond human error to find controls that are crucial and measurable. Their verification process might include several data sources that cover a range of factors contributing to critical control (CC) effectiveness. The data may be combined using a checklist or stoplight approach to identify weak areas for action. However, there is no attempt to generate a single overall effectiveness measure for a CC or the impact on PUE risk.
Some example 2 companies and sites use two major sources of CC verification data, supervisor direct observation data and systems review of supporting activities information such as procedures, usually done by superintendents or managers. However, the data from the two sources is not combined to establish CC effectiveness. This example utilises the complete CCM process but misses the opportunity outlined in example 3.
The company or site gathers a team cross section of site personnel and experts to review a completed Bowtie Analysis that includes the erosion factors that compromise controls and positive supporting activities for the controls. The Objective for example 3 is to manage the PUE risk by tracking status and changes in the expected performance of the critical indicators.
Example 3 is the most mature. Leaders want CCM to provide an indication of PUE risk. CCs are challenged to identify measurable, performance requirement related factors that impact on their effectiveness. Those measures and other factors generate a quality result for that specific CC. However, the company or site is not satisfied with a single CC focus (example 2). The aim is to have a timely indication of the effectiveness of all PUE CCs; a measure of the overall PUE risk.
Pressure relief valve
Consider a CC Object in a processing plant such as the pressure relief valve (PRV). The PRV manufacturer may supply PRV reliability or effectiveness figures which might be considered the baseline. The challenge is to appropriately consider local factors that affect the reliability to modify the baseline figure to reflect the local situation. The local data sources might include reports covering maintenance and repair work relative to performance requirements, as well as local operating conditions. Other sources might include design /modification / installation checks, records of past release events, etc. Using various methods, this approach has been used in petrochemical industry risk analysis for several decades to generate predicted reliability and compare that figure to safety requirements.
The challenge in CCM is to define dynamic measures of that reliability (or effectiveness) that will indicate any change in CC status. Timely data must be gathered on factors that may impact on the predicted PRV reliability, possibly reducing it. Very often in a processing plant this data, as well as unacceptable variation criteria, are part of process control.
However, when the CCs for a PUE are Acts, or Technological Systems where Acts and Objects must function together, then estimating ongoing effectiveness is usually a greater challenge. Difficult measurability as well as inherent human reliability issues should drive us try to evolve our CCs towards Objects or at least well designed Technological Systems. However, the magnitude of that change is great for traditional industries such as mining, as indicated by the mining company that suggested 80% ot its CCs for 20 PUEs were Acts.
Human error methods
There are baseline human reliability figures (called human error probabilities – HEPs) available through Human Error Analysis techniques in industries such as nuclear power generation. The methods also include Performance Shaping Factors (PSFs) that are used to modify the HEPs for local conditions. As such, roughly aligning with the previous approach example on the CC PRV.
However, it is unlikely that these probabilistic human error methods will be adopted by more traditional industries such as mining soon. As companies and sites rapidly move toward CCM another approach to measuring CC Act effectiveness and possibly overall PUE risk, is required.
Observation as baseline human CC Act
The baseline for a specific human CC Act should be some form of observation. For example, supervisors’ observations might yield figures such as 78 times out of 100 observations the act of climbing equipment with 3 points of contact occurred as expected. However, many confounders potentially affect the quality of CC Act direct observations, often making the related data questionable.
For a CC Act to have quality observation data it should:
- Be observable for a significant percent of expected occurrence (ex 10-30% of the expected equipment climbing situations per defined time period to establish 3 points of contacts acts);
- Involve an Act that occurs with some regularity, such as a prevention control, and not an Act that only occurs during an unwanted event (i.e. a very rare act);
- Involve an act that can be observed without the person doing the act being consciously aware of the observation, especially if the observation data is to be significantly extrapolated across a large percentage of unobserved acts;
- A data gathering method that records the act observation so that it can be easily gathered and used to generate the effectiveness.
Other data sources to gauge effectiveness
In many cases, meeting these suggested criteria may be difficult. Solutions might involve developing an observation technology such as the example provided in article 3 on GPS-based vehicle operation monitoring in a surface mine. More commonly, however, limited direct observation ability will need to be supplemented by other data sources to gauge effectiveness.
For some CC Acts this might involve some observations as well as review of related supporting activities to examine the degree to which the CC Act is adequately included. Past articles discussed erosion factors for a control. Regular systems review may also involve gathering data on the status of important erosion factor reduction initiatives.
Three sources of data
The illustration above suggests three sources of data that could indicate CC Act effectiveness. If quality data is available from direct observation which is sufficient to quantify a measure such as percent effective, then data from other sources may not be as important. However, as discussed, this may be difficult.
In earlier articles the term ‘algorithm’ was used to provide an image of verification measurement. An ‘algorithm’ is a process or set of rules to be followed in calculations or other problem-solving operations. The term is used in these articles to refer to a process of combining data on CC effectiveness from multiple sources to generate a single CC effectiveness measure that can be combined or compared to other CC measures. Algorithms can be arithmetic or logic-based such as a decision tree.
If we build on this approach, based on the assumption that the company or site fits our example 3, that is the company wants to monitor for changes in PUE risk, the ‘algorithm’ decision tree approach to measuring effectiveness can be discussed.
Example illustration decision tree
This EXAMPLE illustration shows a specific set of decision nodes that could be used to answer the question ‘has the CC Act effectiveness changed to an unacceptable risk?”. The decision nodes are based on the data sources listed in the earlier illustration. Example frequencies of data gathering are also shown.
Constructing a decision tree specific to the CC Act and the related performance requirements provides an opportunity to combine observation data and addition decision nodes to define a consistent ‘algorithm’ for the company or business to dynamically verify CC Act effectiveness.
Continuing the CCM process
The next article will build on the relationship between CC Act performance requirements and the verification process design using examples. Future articles will complete the CCM process by discussing reporting, site integration and learning steps.