Discussion: Modeling in the Aviation Environment

profilestraw_hat
Ch.2ModellingaDynamicWorld.pdf

Chapter 2

Modelling a Dynamic World

1 Accident Causation Models

As Chapter 1 outlines, this project focusses on accident causation models and in particular their application to the field of Human Factors in aviation. In order to fully understand the current position and trends of accident causation modelling it is important to acknowledge the developments and history of the area and where there may be room for further investigation and work. This chapter aims to provide a comprehensive analysis of the history and development of models and where opportunities and validation for this project arise.

The next section begins with an aviation accident case study. This is then referred to at salient points of the chapter in order to maintain a rooted discussion. It is appropriate to look at the history of accident investigation models by way of illustrating them with a contemporary accident case study.

2 Runway Overrun at Bangkok (QF1)

At about 22:47 local time on 23 September 1999, a Qantas Boeing 747-438 aircraft registered VH-OJH (call sign Qantas One, en route from Sydney to London) overran runway 21 Left (21L) while landing at Bangkok (Don Mueang) International Airport, Thailand. The aircraft landed long and aquaplaned due to the runway being affected by water following very heavy rain.

The first officer was the handling pilot for the flight. The crew elected to use the ‘normal’ company practice configuration for the approach and at various stages during the approach to runway 21L, the crew were informed by air traffic control

Introduction Literature Review

Applying the Approach to

Industry (British

Airways SMS)

Conclusions

Development of a Complex

Network Approach

Validation Study (Flight Simulator)

Application of a Complex

Network Approach

Extending the Mathematical

Model (Bayesian Approach)

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention26

that although there was a thunderstorm and heavy rain at the airport, visibility was 2½ mi/4 km (or greater).

At 22:44:53, the tower controller advised that the runway was wet and that a preceding aircraft (which landed at approximately 22:40) reported that braking action was ‘good’. As the aircraft descended through the 200 ft/60 m point, it started to deviate above the 3.15° glideslope, passing over the runway threshold at 169 kt at a height of 76 ft/ 23 m. Those parameters were within company limits but both high and fast. When the aircraft was approximately 10 ft/3 m above the runway, the captain instructed the first officer to go-around. As the first officer advanced the engine thrust levers, the aircraft’s main wheels touched down and the captain immediately cancelled the go-around by retarding the thrust levers, without announcing his actions. This resulted in confusion among the flight crew and reverse thrust was not selected or noticed to be absent during the landing run. The aircraft came to rest some 720 ft/220 m after the end of the stopway with its nose resting on an airport perimeter road. The aircraft sustained substantial damage during the overrun. None of the 3 flight crew, 16 cabin crew or 391 passengers reported any serious injuries (ATSB 2001).

Single Perception Theory

1890s The birth of modern research into accidents and causation is mostly attributed to the work of Bortkiewicz (1898). He concluded, from limited studies, that accidents occur at random and are therefore inexplicable. This view luckily did not restrain further research into accidents but instead opened the gates for years of investigation, conjecture and argument.

1910s and 1920s The majority of the work and investigation into accidents was at first set with a pivotal view of a single event perception whereby an accident or incident is regarded as a solitary event for which there must be a solitary cause. The job of an air accident investigator was to find this cause and, by eliminating it, stop an accident from recurring. Elements of the idea of a ‘single event’ remain, and mistaken use of the concept still occurs in work with aviation or other complex systems. However, it was soon realised that these environments spawn much more complex interactions between human–human, human–system and system–system components. This view of accidents also allowed for a blame culture to flourish, in that a party was seen as responsible – as a ‘cause’ of the ‘event’. An accident had to have someone or something at fault, to blame, so that what had happened was not purely an ‘act of God’ that could not be explained. This rather simplistic view and the work of, for instance, Greenwood and Woods (1919) from the Industrial Fatigue Research Board (IFRB) gave rise to the ‘accident proneness’ model. This focussed solely on the individual (rather than the system) and came to dominate the research and accident reduction exercises for the first half of the twentieth century.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 27

Further work within the IFRB and clinical studies of reactions, coordination and distraction, among other elements, concluded that accident proneness existed. It was considered related to nervous instability and poor aestheto–kinetic coordination (Farmer and Chambers 1926 and 1929). The uncritical acceptance of such an accident proneness model by the community at the time is almost fully attributed to this work. This view persisted for many years, although it can be seen that other work, such as domino theory, was already being developed, exposing the shortfalls of the current theory. Indeed, studies continued long after this time, examining the concept and working around the broad theory base of accident proneness in individuals. For example, in their 1988 study, Mohr and Clemmer find no real evidence for a proneness that is measurable or useful in accident causation analysis and conclude that ‘it is unlikely that overall injury rates in the workplace can be effectively reduced by screening out workers with excessive numbers of injuries’ (Mohr and Clemmer 1988: 127). This work illustrates the shortfalls of this model of accident investigation and so highlights the limits of application to our Qantas Flight 1 Accident Case Study (QF1) If accident proneness were the case, then the pilots should have been involved in other incidents prior to and following on from this event. These ideas cannot be substantiated given the evidence. This view also suggests that these people can be selected out and, therefore, that all accidents can be prevented by removing the accident-prone individual, at the selection or at the training stage, or after any incident has occurred. This is now, almost universally, accepted as a flawed theory. Dekker (2006) describes this ‘Bad Apple Theory’ as the ‘Old View’ and contends that safety progress was made mainly from technological advances and not as the result of the application of these theories. Thus, further models were needed to attempt to explain accident causation.

The main problem with the simple explanation of single perception theories, other than the realisation that more complex interactions occur, is that it assumes an innate replicability in incidents. If a ‘cause’ can be removed, then the accident could not happen again. Were this applied to QF1 and the event investigated in the 1920s, the pilots could be sacked and so the incident should not recur. This does not address any of the real issues and would have a devastating effect on morale and reporting behaviour were it enacted. The inherent fact is that accidents are viewed, now, at least, as being so complex that many different ‘causes’ could have produced an incident. It is often very difficult to identify a certain cause or produce an effective barrier to similar accident types. The single event perception is very much suited to the type of investigation predating Human Factors influence, as that conclusive ‘part’ of an aircraft, etc. could often be found and blame attributed to structural facets of the system. However, a systems view had to be developed and adopted.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention28

2.2 Domino Models and their Development: The Demise of ‘Proneness’?

1930s and 1940s In his book Industrial Accident Prevention (1931), Herbert William Heinrich elaborated for the first time on his domino theory of industrial accidents. For the first time in Human Factors literature, accidents were attributed to a sequential chain of events rather than a single causal factor (normally an employee). In order to illustrate this Heinrich used the idea of a series of dominoes falling over, causing the final event.

It is fundamental to this model of accident causation that Heinrich labelled each of the dominoes with causes that may lead to an accident. It can be contended that this resulted in the basis for modern accident causation models.

Heinrich’s first domino was entitled ‘Social Environment and Heredity’. This referred to the personality traits that are believed to be inherited, or the social environment that the worker is immersed in, influencing the likelihood of that worker being involved in an accident. This, in particular, echoes elements of accident proneness theory whereby internal facets of a human contribute towards accidents regardless of external factors. This, in a way, shows a development in accident proneness rather than a complete deviation, but the other dominoes bring in factors that were being discussed in all the research of those times.

Second, and linked through the chain of events basis of the theory, is the ‘Fault of Person’. This refers to the effect a worker’s life (as an outside influencer) is having on events such as family problems, fatigue, and so on. This includes flaws developed in the context of the social environment and the system in which the worker operates. This is a significant drawing together of ideas that external influences on an individual and accident causation are as significant as internal ideas of proneness, if not more so. Today this is still an important area in the investigation of accidents and incidents. These ‘soft issues’ are often easy to gain from those involved in an incident or accident on a surface level. It can be contended that approaching Human Factors via the ‘soft issues’ of family life and social life allows the industry to merely tick a box and not truly understand the more complex facets of system interaction with humans at all levels. However, referring again to QF1, had the event occurred in the 1930s there would at least have been some form of defence for the flight crew. For the first time in an investigation, outside influences would be considered and from this came the potential for changes to regulations, training and standards.

The second domino was also developed in Heinrich’s later expansions to include the actual expression ‘Mistake’ as a result of these personal factors. The third domino illustrates Heinrich’s direct cause of accidents/incidents. This domino was called either an ‘Unsafe Act’ or ‘Unsafe Condition’. The very idea that this domino is required in order to knock over the fourth, ‘Accident’, shows that Heinrich postulated that one or both of these must be present for an accident to occur. This model was the first to really develop the importance of behaviour on influencing safety and accident causation and Heinrich felt that this third domino was the most

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 29

important, and also easiest to remove from the picture in order to prevent accidents. The presence of this domino means that for the first time in investigating QF1 it would be necessary to dig deeper into the behaviour and rationale of the flight crew’s actions; a movement towards truly integrating behaviour.

The final domino, the fifth, was named ‘Injury/Property Damage’ and later included ‘Near Miss’. The idea is that if one of the first four is removed, then the fifth ‘event’ is avoided. At last there was a gateway into the complex world of system accident causation.

This was a significant step in the right direction for the study of causation and for modelling the Qantas incident more comprehensively. Any analysis is, however, still limited by the problems of a linear causal chain. If one domino is not present, then theoretically the event would not occur and yet it is arguable from the report whether, for example, there were any significant outside influences (e.g. personal life) that affected the crew. Also, in the instance of the first domino, heredity is somewhat questionable and the social environment was the same for the first officer and the captain, and yet it is possible to suggest it was the captain who made the first mistake. A major problem with this, and one that has been brought to light particularly in recent years (see, for example, Young et al. 2004), is that it may result in the search of answers to fit the model so that the chain is not broken. Although this does appear to answer some of the complex issues involved in the QF1 incident, the model is too reliant on the linear chain and on ideas of the individual. This takes little or no account of, for example, training and management. Arguably, had Domino Theory predicated the QF1 investigation, the responsibility would again fall on the flight crew, as this is the most ‘resolvable’ course of action.

If we take the third domino, which is arguably the most important issue in Human Factors, it can be broken down into Heinrich’s terms of unsafe conditions (or reasons to commit unsafe acts). Expanding upon this idea of unsafe conditions is that they are caused by: physical unsuitability, lack of knowledge or skill, poor attitude and an unsafe working environment. This today still holds very true as these factors – with just the addition of new buzzwords to the bare bones structure of the seminal framework – may be seen as precursors to complex incidents. Heinrich goes on to distinguish between underlying and direct conditions within this domino. Although Heinrich suggests that the first two dominoes combine to produce the third, Vincoli (1994) puts it rather neatly that these unsafe conditions in the third domino are in fact ‘symptoms of root causes’ from the first two dominoes, so modernising the view on the unsafe act being a result, not a cause in itself, of underlying factors.

While the domino theory had been published by Heinrich, during the 1930s the movement within the area still centred on theories of single perception. There was, however a growing disillusionment with the studies that led to the theory of accident proneness. Johnson (1946) criticises over 200 studies that worked on accident proneness and cited invalid, inadequate or inappropriate statistical methods and conclusions. Indeed, much of the work on defending the theory from

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention30

the IFRB and others was statistical argument on whether patterns of accident rates fit poisson and other distributions effectively. The major flaw with this work, though, was that it was based on a presumption of a homogeneous exposure to risk. This polarised the work from recognising a difference in hazard exposure to individuals or effects on the individual such as those mentioned in domino theory – family problems, depression, and so on. It is this basis of a homogeneous exposure to risk that fails to explain how aircraft only minutes prior to QF1 managed to land or go-around safely. Adelstein (1952), too, was intrigued by the idea of accident proneness and its application to hazardous work. His study of 1,452 accidents of shunters on South African railways concludes that chance factors explain a lot more than proneness. He went on to conclude that there was no significant correlation between individuals and the repetition of accidents or accident rates over five years. He did not try to formulate any new theory to explain the patterns, but merely sought to apply accident proneness theory into a truly empirical situation.

1950s and 1960s A new view was developed based on the work of Cresswell and Froggatt (1963) among others. In their study of the causation of bus driver accidents, they coined the term ‘Accident Liability’ as a more developed addition to single perception theory. This reflected a propensity of individuals being prone to take risks rather than being accident prone directly and allowed for a new focus for research and behaviour adaptation.

To reflect these developments of theoretical knowledge and beliefs, further work was carried out throughout the period by a number of individuals to develop single perception theories, i.e. those holding the individual fully responsible in a Human Factors sense (as an alternative to an object failing). Clark Kerr in particular based his work around the premise of the single perception theories, but reflecting current research, which was moving away from the rigid accident- proneness suggestion, developed his goals–freedom theory (GFT) (Kerr 1950). The fundamental concept behind this theory is that unsafe behaviour leading to an accident is due to an unrewarding psychological climate in the workplace that leads to a lack of mental alertness. This is still centred on a single cause of accidents, but begins to take into account fully the idea that external factors, or internal problems, may influence an individual, so takes it away from some innate proneness. These developments begin to attack at the heart of the investigation into the Bangkok runway overrun. There is still very much lacking when addressing such an event, though as this is still ultimately single-perception oriented it can easily over-simplify an incident. This case study is a prime example of where a relatively simple-looking incident on the surface could all too easily be classified as such, but if the surface is broken, as in the accident report, much more (and more meaningful) information can be retrieved.

In 1957 Kerr developed on his idea of GFT with his adjustment stress theory. In this he reflects on the negative work environment contributing to accident causation

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 31

and highlights stress as a factor that manipulates an environment, preventing the individual from concentrating fully on the work. This stress covers a multitude of areas, including personal situations, time pressures, poor relationships in work and workplace hazards.

While this work was going on in the background of industry, the 1960s, 1970s and 1980s brought technical advances and observable reductions in accident rate, so the fighting force of Human Factors was arguably reduced during this period. Indeed, the pendulum had swung somewhat away from the human involved in an incident directly to a view of making the technology accept this ‘inevitability’ of accidents. Haddon (1961) commanded the design of fail-safe vehicles given the inevitability of accidents for some time into the future. This concept has again come to the forefront of study in recent years under the new label, ‘resilience engineering’ (see, for example, Hollnagel, Woods and Leveson 2006). Even though Human Factors work, in the form carried out to date, was slightly dwindling, there were still a number of notable developments from the standpoint of cause with respect to the bigger picture.

2.3 Development of a Domino Idea

1970s By the 1970s in particular, the domino theory had come to be broadly acknowledged and many based their work on it, for example Weaver (1971). He put more emphasis on the poor supervision afforded by the management levels and stressed still further the importance of identifying the unsafe act and developments around this occurring. Bird and Loftus (1976), in their loss causation model reflected the direct management relationship between incidents and accidents and added an extra domino to the original domino process based on lack of control by the management (Figure 2.1). However, by merely adding a new domino, it could be assumed that (as no uncontrollable factors are considered in such a model) all incidents are avoidable if the management asserts enough control of the system. This almost blanket assertion that management is to ‘blame’ cannot be justified for the QF1 presented. Beaty (1995) has repeatedly praised the management at Qantas and presented it as a model in many ways for airline operations worldwide.

Adams (1976) took this one step further and recognised the complex interaction of management strategy further up the chain using the term ‘organisational errors’ (a term still greatly used today) to encompass the first three dominoes of the chain. Both worked to advance the model but neither made a giant leap in investigating accident causation as they were still bounded by limits in the applicability of a framework or model to the dynamic world in which it would be applied.

One of the first, and originally influential, attempts to apply this theory to the ‘real’ world was Johnson’s ‘Management Oversight & Risk Tree’ (MORT) (1975). Johnson recognised the interaction between personal, organisational and physical (environmental) aspects of the system or situation (Figure 2.2).

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Figure 2.1 Interpretation of Bird and Loftus’ (1976) loss causation model

Figure 2.2 Diagrammatic representation of Johnson’s three-level model of accidents

Source: Adapted from Leveson 2002.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 33

Moreover, Johnson’s attempts to indicate that several accident pathways may develop over time led to this being a highly researched area of Human Factors and, in particular, to the rise of frameworks such as Reason’s generic error modelling system (GEMS) or ‘Swiss cheese’ model.

MORT therefore encompasses not only Human Factors in terms of the individual but also a systems perspective in accident analysis. The first stage of a MORT analysis involves the ‘standard’ investigation into ‘failures’ by equipment or individuals that could have contributed to an event, but also exerts some influences on the individuals and teams under investigation and to some extent covers post-accident events, including the response and availability of emergency services, and so on. It is at the second stage that MORT begins to look in more detail at accident causation by viewing ‘management system factors’ (Johnson 1975). In this, the general situation at the time of the accident is considered and failures within management and even other organisations are investigated. This happens even if there is no apparent direct link to the actual accident. MORT assumes a level of responsibility for accidents at the management level as it is said that they should create an organisation that, through directives and conditions, would not allow situations such as these. MORT looks for actions at management level that could have prevented an incident at the event or in the pre- and post- event timelines. These are encapsulated into a standardised MORT fault tree to illustrate the development and causation issues of the event under investigation.

As is clear from this summary of the application so far, MORT is heavily management-oriented and almost appears to hold a level of blame, but merely shifts this up the organisational structure. MORT does, however, allow some risks to be termed ‘assumed risks’ and these do not hold management responsible, as it would not be viable to do so for the smallest of factors in a real-life organisation. Possibly one of the most significant results of the work into MORT was the development of the idea of ‘barriers’ in a system. These can include simple physical barriers such as guards on machines or implementing a procedure to avoid an accident within the workforce, e.g. the go-around procedure for our QF1. It is said that an accident occurs when one or more of these barriers is broken through either by human action or by some form of technological failure. Working the QF1 towards MORT, the points hold about the management being important, especially in terms of the barriers that were put into place, but also the problems that resulted from incorrect approach procedures, or weather training being implemented. These do not, however, account for the actual decisions made by the flight crew and to an extent this approach neglects the direct influence of the conditions and actions despite a reference to assumed risks. This model most comfortably encompasses the case study thus far and has moved itself slightly further away from the restrictive linear flow models. In itself though, the model falls foul of its own complexity and inflexibility for use as an investigative tool – a problem that, it can be argued, is still present with many of today’s frameworks.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention34

1980s and 1990s By the late 1980s and 1990s, the effect of technology improvements on accidents had diminished and again the rate plateaued. With this in mind, the greatest leap towards more investigation into organisational contributions to incidents has to have come from James Reason’s Swiss cheese model of accident causation (1990 and 1997). This model has been adopted by individuals, companies and world regulatory bodies such as the International Civil Aviation Organization (ICAO), the global aviation body, as the basis of their investigative efforts and understanding of accidents. This includes the application of the model during the course of the Flight QF1 investigation itself.

Investigations into accidents outside the realm of aviation have also moved the concentration of investigation into the system as a whole e.g. Three Mile Island (Pennsylvania, US, 1979), The Herald of Free Enterprise (off Zeebrugge, The Netherlands, 1987) and Piper Alpha (North Sea, 1988). These incidents, extensively discussed in the literature, all illustrate the movement towards an organisational view of complex events. Indeed, some years earlier, Perrow (1984) argued in his text that it was the nature of complex, tightly coupled systems to suffer unforeseeable socio-technical breakdowns. This appears to have formed the basis for the movement exemplified by Reason’s work. This does, however, take any power away from the possibility of predictive work of accidents and incidents if the breakdowns are truly ‘unforeseeable’.

Figure 2.3 Reason’s Swiss cheese model of accident causation Source: Adapted from Reason 1997.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 35

As can be seen from Figure 2.3, Reason has identified several layers within which ‘holes’ are always present and where these align the result may be a catastrophic event. In the majority of cases, a layer will stop an event resulting in catastrophe; and holes are fluid in that they may appear and disappear or change in magnitude, depending on the psychopathology of the system or organisation. Most important to Reason’s model is the distinction of latent errors versus the active errors of those at the ‘sharp end’ of the system. Arguably, though (see, for example, Young et al. 2004) this would suggest that human error, in either latent or active form, would be a contribution in 100 per cent of incidents and accidents. This discussion of the percentage of accidents and incidents ‘caused’ by human errors has long been contentious, with many views being put forward from Heinrich’s (1931) discussion of an 80:20 split to Boeing’s (1996) two-thirds, to this plausible view of 100 per cent. Even Reason (1997) says that although there will always be a presence of active failures, due to defences most will be caught and not lead to negative outcomes. This in itself, however, appears to highlight a major problem of Reason’s work, which is that without a predictive element some active and latent conditions will continue to exist and this leads to problems. This does tend to limit the model’s applicability to a post-mortem investigation into the pathology of an organisation rather than a context-specific and applicable method of investigating all precursors to an incident or accident before or after the event. Therefore, it can be suggested that the danger lies in the way this model has been adopted and applied by rote and this needs further work.

When discussing his developments and model, Reason states that we ‘cannot change (the) human condition, but we can change the conditions under which people work’ (2000: 768). In other words he seems to allude to the fact that understanding the ‘why’ of an individual’s action at any cognitive level may well be of little or no use, as it is an inevitability that error will occur. But is this too simplistic and almost fatalistic? Even Reason, in a conference presentation in late 2003 suggested ‘perhaps we should revisit the individual (the heroic as well as the hazardous acts)’ (2003: 26). These appear to be significant words in re-highlighting the importance and relevance of active error.

This model has made a significant jump in the field of human error investigation and drew the attention of investigators and companies away from solely studying and blaming individuals. Through methods such as Tripod Delta (see, for example, Groeneweg and Roggeveen 1998) organisational identifiers of latent factors are centralised. The method, based on Reason’s model, aims at controlling latent factors within an organisation through identification, categorisation and the use of compensating factors. Indeed, we only have to look at the example of the Space Shuttle Challenger accident of 1986, which led to loss of life and shock across the globe, to see this movement more towards organisational errors being placed at the core of an investigation. In her in-depth and intelligent review of this event, Diane Vaughan (1997) shows how applicable Reason’s model is to a timeline of

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention36

events stretching back through nine years of latent pathogens and arguably no active error.1

Despite the massive developments that this new view on accident causation and investigation undoubtedly caused, the movement appeared to remain central to the fundamental domino theory of a single causal chain but with the addition of factors and influences into the model. Even the Swiss cheese model appears to be an extension of this and it is contended that such event chain models encourage limited notions of linear causality that make it difficult to incorporate non-linear relationships such as those of feedback (Leveson 2002). Some worries also occur when such a model of causation is applied as an accident investigation method and model as in the ATSB’s investigation of the Bangkok overrun (see section 2 above). The distinction between the two appears to be fuzzy at best in real-world application. Following an illustrative model such as this could lead to latent factors or unimportant issues being searched for or found at the expense of others in order to complete the ‘required’ model of the accident chain.

Reason (1997) argues that latent errors are always present in any accident or incident, but active errors may or may not be present, that is, the active errors may well be the consequence and not the cause of incident pathways and are not a requirement if enough latent conditions exist. This helped to emphasise the concentration of industries on a relentless search for latent errors, which have proved to be very difficult to find pre-event and too easy to find (or at least search for dogmatically) post-mortem, due to factors such as hindsight bias. Fischoff (1975) emphasised that we rebuild the past into a linear and a logical fashion in our minds, but such a linear progression is not possible in the real world; it is thus almost an oversimplification which magnifies the problem of searching for latent conditions in order to satisfy an abstract requirement for an unnatural cut- off point in the investigation. That is, there is no natural point in a post-mortem accident investigation at which to stop searching when looking at latent conditions further and further up the organisation – but where do we start to get non-returns, or unrealistic returns, for effort? Sidney Dekker (2005) points out that Reason’s model is limited by the confines of structuralism and although it works well post- mortem (hindsight notwithstanding) it lacks function in the fluid pre-event world. This limitation is still prevalent today.

Perrow (1999) also emphasised this effect of latent conditions in a system, which are most succinctly described by Reason (1990) in his seminal resident pathogen metaphor. This complemented all of the work that has led to many in-depth discoveries of the salience and importance of latent errors. There are occasions, however, such as the Chernobyl disaster in Ukraine in 1986, that have arguably been attributed to purely active errors by the operators at the plant.

1 Even this is contentious, however, and the ‘launch decision’ has been suggested as an active error. This re-highlights the issue of errors being the cause or result of events, as the launch was the cause of the event but also the result of many previous events, for example, the meetings with engineers.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 37

These appear to go against the grain of a wholly Perrow or Reason oriented approach to accident investigation. It should be remembered that this model is an illustration of an idea, not a direct application for the accident investigators to rigidly adhere to. The relevance of these developments to those of a blame culture is discussed in section 3.2 below, especially with reference to Reason’s (1997) fundamental attribution error and surprise errors.

It seems pertinent to point out that Reason’s organisational model of human error is not said to have never been useful. One example of a methodology in regular use and based upon his ideas, especially within aviation accident literature, is the Human Factors Analysis and Classification System (HFACS) (Shappell and Wiegmann 2001). HFACS has been applied to numerous aviation accidents and can identify useful relationships between active and latent factors (see, for example, Li, Harris and Yu 2008). However, there are many limitations associated with such a methodology. Shorrock and Chung (2010) studied the links between research and practice and found gaping holes in the success of models such as HFACS.

It was during this time, however, that work was also being carried out to bring a completely new perspective to accident causation. An early example was the work of Hendrick and Benner (1987) in their development of the Sequentially Timed and Events Plotting (STEP) method from their earlier work on Multilinear Events Sequencing (MES) (Benner 1975). Not only did these theories aim to help investigators in the actual carrying out of an accident investigation but they also moved significantly away from domino theory rebirths. Both of these methods are based upon perturbation theory (P-Theory), which is based on a system homeostasis being maintained and, if this is disturbed by a perturbation, then an accident sequence will develop if the system does not adapt (Benner 1975).

The STEP method uses cards to consolidate event information, including the actors and actions involved, as well as a description and source of information. Events are then placed in a tabulature with time along the x-axis and actors along the y-axis, and causal links are drawn between events where required. This gives an investigator the details of the accident sequence and can highlight where defences or barriers have failed in relation to, for example, the Bangkok overrun event, and where further developments may reduce further incidents. When applied to a case study such as QF1, it appears that STEP provides a useful method for illustrating and investigating both active and latent issues. However, there would appear to be a facet missing from making STEP a tool suitable for use within complex systems and that is the focus on a single event (and its immediate surrounding events) rather than on the system as a whole. There needs to be development in these tools that allows them to reflect a system space during normal and abnormal work.

The STEP matrix (Figure 2.4) when completed illustrates nicely this birth of network approach to accidents built up of multiple causes, multiple actors and multiple events. This allows network models to be fluid (or at least dynamic), which counters the issues raised, referring to Dekker’s (2005) assertion that Reason’s model cannot.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention38

Rasmussen (1997) also developed work to step outside the constraints of single-chain causation, albeit with added influences and factors. Rasmussen understood that to decompose behaviour into decisions and actions artificially isolates the phenomenon from the context in which those behaviours have taken place. This is, he concluded, an ineffective way of trying to understand behaviour. He presented a systems approach, looking at vertical integration between layers of a dynamic socio-technical organisation. Rasmussen knew the importance of a closed-loop feedback system to an organisation’s success in a dynamic world (Figure 2.5) so that, similar to the STEP P-Theory idea of adapting to changes in the system in order to prevent chains of events leading to the occurrence of an incident, the system may remain stable. A lack of vertical integration between the various layers, shown in Figure 2.5, can be blamed on this lack of feedback, and the actors involved (human or machine components) also do not understand fully the role of the other actors within the system or in the immediate vicinity, i.e. actors at other levels.

Vicente and Christofferson (2006) produced a paper illustrating how Rasmussen’s framework for risk management in a dynamic society can fit the breakdown of a water supply system resulting in many injuries and deaths in Walkerton, Canada, in 2000. The work by Vicente and Christofferson (2006) was the first full-scale independent application of the framework to dynamic society. It was concluded in this paper that most of the predictions of the model were found to be true by the surrounding events. The defences are seen to erode not all at once, but gradually, over time, as the interactions between the different layers degrade and feedback is reduced. The temporal pattern of events is not illustrated easily in this method, but a descriptive map of factors can be derived and from this all the interactions and responsibilities of actors from all levels can be shown.

Figure 2.4 A simplified STEP matrix for a car accident in an urban area

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 39

Vicente illustrates the relevance of this approach to systems failure. This framework when applied to a case study such as QF1, still lacks, however, elements of the network methods of STEP discussed above. Such facets as a temporal and more detailed event synopsis are lacking, yet it gives a good overall basis for abstracting details about an incident and allowing generalised conclusions.

2.4 Human Reliability and Error Identification

If accident investigation is one side of a coin, we can now turn our attention to the other side of that coin, accident prevention, which, as we have seen, has many unanswered or unanswerable questions and problems. Although not exactly new in the realm of Human Factors, methods such as Human reliability assessment (HRA) techniques are still in their infancy in terms of development. These methods use probabilistic risk assessment as a form of basis. This movement towards quantitative methods of accident investigation is often powered by a need to limit subjectivity. The quantity of historical data that is so often collected and perhaps not utilised to its full potential is also important, since these data often help develop probability relationships. This section, then, addresses the

Figure 2.5 Illustrating the levels of a complex socio-technical system Source: Adapted from Rasmussen, 1997.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention40

more quantitative aspects of accident investigation models which to date are often separated from their qualitative ‘cousins’.

Human error identification (HEI) is fundamentally the initial part of HRA techniques and is used as the basis of probabilistic safety assessments (PSAs) (Cox and Tait 1991). HEI is a subjective technique that is method-specific. The aim is to illustrate the impact of human error on a system and also the recovery associated with that error (Kirwan 1998).

Kirwan (1998) highlighted the relevance of HEI to Human Factors and Ergonomics independent of HRA in that the identification of errors through the process is result enough in itself. These factors are further refined in Error Reduction Analysis (ERA), which can demonstrate the ways of reducing the likelihood of the error, or, if it occurs, the impact on the system. This eliminates the dependence on probabilities and quantification of errors, the use of which is highly debated in the literature. Reflecting on previous discussion about the over- emphasis on chain type models of causation Leveson (2002: 14) states that the ‘limitations of event-chain models are also reflected in the current approaches to quantitative risk assessment. When the goal of the analysis is to perform a PRA, initiating events in the chain are usually assumed to be mutually exclusive. While this assumption simplifies the mathematics, it may not match reality.’

In Kirwan’s (1998) discussion of the HEI methods available he finds 38 in the literature, including well-known examples such as the Systematic Human Error Reduction and Prediction Approach (SHERPA) (Embrey 1986), the Technique for Human Error Rate Prediction (THERP) (Swain and Guttmann 1983), and the Hazard and Operability study (HAZOP) (Kletz 1974). Almost half of these have been produced in the last five years, which demonstrates that the emphasis shown on this area of study throughout the 1990s continues today.2 SHERPA (a flow chart-based application) has had wide application and remained a popular tool in complex tasks, but it has been seen to become unwieldy and too resource-intensive (Kirwan 1998). ‘Second generation’ HEI methods have been developed such as the Cognitive Reliability and Error Analysis Method (CREAM), see Hollnagel and Embrey 1994) which was based upon SHERPA, Rasmussen’s (1988) skill, rule, knowledge framework (SRK) attempts to incorporate a cognitive level of analysis to HEI. This and the TRACEr method developed by Shorrock and Kirwan (2002) can be applied both retrospectively and prospectively, which advances the usability of these tools greatly in accident prevention. An important point also is that these diagrammatic, or simply formalised, methods of investigation allow observers to view the basis upon which investigators draw their conclusions and bring improved transparency.

As the design related to aircraft develops with time, so must the means with which to qualify the design and analyse/investigate human-related interaction that accompanies it evolve. The ERRORPRED Project (Stanton, Harris, et al. 2006) is a great source of an investigation into the modern investigative tools available to

2 Stanton, Harris et al. (2006) find over 100 models in the literature.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 41

Human Factors practitioners at present. All of the major models are covered and their positive and negative aspects discussed briefly. The problems found tend to be that despite the number of models available, many become too unwieldy to apply in complex situations and there still appears to be large scope out there to develop a framework, especially including some level of cognitive modelling of actors, that is not too general to use in investigation nor too specific to just one realm or particular situation. Stanton, Harris, et al. (2006) developed a toolkit approach to the problem of predicting error and this method fared favourably when compared with standalone methods, such as SHERPA and HAZOP mentioned above. The toolkit approach appears to deal well with the complexities of working with error and increased the sensitivity of error prediction and multiple-analyst validation compared to other HEI methods used in isolation (Stanton, Salmon, Harris et al. 2009).

During their development of this new HEI tool for aviation, Stanton, Salmon, Walker et al. (2008) reflect that despite the number of techniques available, the applicability in a number of situations is still questionable: ‘The goal for researchers now remains to investigate how these contemporary HEI methods can be improved and also the development and creation of new, aviation specific HEI methods’ (Salmon et al. 2002: 129).

3 Current Application of Human Factors in Aviation Accident Investigation

3.1 Hindsight as a Barrier to the Future

2000s Do we wish to address specifically the direct causes of an incident or accident, or address the overall pathology of an organisation and system? This is the question that presents itself at this junction in Human Factors work in accident investigation. As discussed by Young et al. (2004) and Dekker (2002), many latent conditions or failings are found in current accident investigations, but the ability to attribute causality is only truly possible in hindsight. The relentless application of these top-down models of accident causation (see, for example, Reason 1997) although without doubt finding many important latent conditions within an organisation may well be the barrier to any form of predictive and arguably more relevant investigation. The argument stands that searching back from an event there is no natural ‘stop’ point. Indeed, Braithwaite (2001), whilst discussing Moshansky’s 1992 investigation of the F-28 Air Ontario icing crash from Dryden, put it rather well when he said that the ‘apparently vast number of errors was not indicative of a particularly bad accident, rather, a thorough investigation’.

Since the concept of hindsight as a major problem with investigation is being discussed, let us first look at when the concept originally arose. Fischoff (1975) discussed this at length and concluded that hindsight was not equal to foresight, but the movement at the time believed that by knowing how past events occurred and eliminating some element of this, they could not occur in the future.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention42

Although simplistic by today’s standards this did set about a very important culture within industries to identify errors and problems within the system or individuals. Now the prevalent idea is that, as mentioned previously, the past is structured into a linear set of events. However, this simplification was not possible in a dynamic period of time, so immediately we are behind in the ‘war’ on the factors involved in the lead-up to an event. It is known that simply removing an element will not prevent future accidents. It is further understood that the use of ideas such as the fluid motion of the layers within Reason’s model and the holes within those layers means that the situation can be viewed as ever-changing. This forms two main camps, of which those who feel the need for the acceptance of fallability in humans and the need for ‘resilience engineering’ (see, for example, Hollnagel, Woods and Leveson 2006) appear to be the most forward thinking.

When looking at causation it is important to try and place ourselves in a position similar to that of the individuals and system at the time. In this way an attempt is made not to utilise our own biased view of what it may have been like with our new omnipotent knowledge. Woods (2003) put it very succinctly when he said that the future seems implausible before an event but after this the past seems incredible in a kind of ‘how could they not have seen that’ way. This surprise factor, which must have been present and discernible by individuals at the time, is also discussed in Reason’s book Human Error (1990), and the truth is they did not ‘see it’ (by any means, aural, visual, tactile), or the event would not have occurred. This is further developed by Dekker (2005) in his discussion of the ‘banality of accidents’. Accidents are not the result, he and Perrow (1984) suggest, of a series of incidents leading up to an accident, but are normal people doing normal work in normal organisations. From the point of view of an individual involved, the lead-up to an accident may well appear that way. This gives extra weight to the importance of addressing ‘near misses’ and to reporting programs such as the Confidential Human factors Incident Reporting (CHIRP) or the British Airways Safety Information System (BASIS).

What can be gleaned from accident investigations using the prescriptive method of adhering to a top-down model appears to be a great audit of a company and its pathology, but it may not realistic to extract ‘causes’ of the event proximal enough for them to be usefully analysed and attacked. It is, after all, ‘workable remedial applications’ (Reason 1997) that we are searching for in accident investigation, not an unmetered breakdown of all company failings. Therefore, as Young et al. (2004) suggest in their paper, maybe a bottom-up approach, as discussed again in Dekker 2002, may be more useful and less biased in the hunt for true causes of accidents. Then, if acted upon appropriately, those causes may positively affect the future of the system.

3.2 The Safety Culture and Blame

All of the models discussed above have been used as the basis for investigations into accidents. The basis for the study of Human Factors in accident causation and

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 43

in aviation, along with other complex socio-technical systems, is to increase safety within and across the industries. Surely the basis of this is to have an organisation that is safe. Looking at the organisation as a whole, what exactly can make a safe culture or background for a system in order to decrease the chance of incidents and accidents? This section addresses some of the issues which are important regardless of the model or method chosen, since they directly affect the reporting and investigative practices of organisations.

Westrum (1992) identified three major organisational cultures: pathological, bureaucratic and generative (Table 2.1). It can be seen from the table produced that an airline, or other complex industry, should aspire to generative ideals. Things are not swept under the carpet, but are addressed and ideas welcomed; a mainstay for risky and complex operations is the need to be adaptable and imaginative.

Table 2.1 Westrum’s ‘organisational cultures’ and how they handle safety information

Cultures Pathological Bureaucratic Generative

Topics

Information Don’t want to know May not find out Actively seek it

Messengers ‘Shot’ Listened to IF arrive

Trained and rewarded

Responsibility Shirked Compartmentalised Shared Failure Punished or

concealed Lead to local repairs

Lead to far- reaching reforms

New ideas Actively discouraged

Often present problems

Welcomed

Source: Adapted from Reason 1997: 38.

Learning needs to become a priority within these organisations and the need is strong to remove the typical epitaph of complex socio-technical systems – that there is always something else more important or pressing (Reason 1997).

A number of prescriptions for being a safe culture and a safe organisation have been mentioned in the literature over time, such as Reason (1997) discussing organisation ‘competence’ in terms of collecting the right data, acting upon it and disseminating it in a useful way to all concerned. Examples of this would include the incident data collection and safety information systems discussed later, in section 4, on incidents and their importance. There is no definitive organisational safety health at present though, and papers or companies will use whatever statistic or comparison suits their needs in order to shed positive light. Reason (1997: 191) states rather importantly that most commercial airlines today have almost uniform

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention44

training, operational and regulatory procedures and even similar aircraft types, so the difference around the globe of flying an airline with 1:260,000 chance versus 1:11,000,000 chance of an accident with at least one death may in some part at least be down to the culture of that airline. This again should be comparable with other industries and indeed in the UK’s report on the incident at Chernobyl the head of the UK Central Electricity Generating Board (CEGB) at the time, Lord Marshall, among others (Reason 1990) went so far as to state that the accident could not have happened in Britain, which has a generally stronger safety culture. Identifying and quantifying what actually makes a safe culture is an area in which much work needs to be done in order to make effective use of good examples worldwide. This is shown to be hard, though, when an individual, approaching a problem from any particular state or inclination – eastern, western, or other – looks at a safety culture from the perspective of comparison with their own way of doing things, rather than adopting an objective perspective appropriate to a total outsider.

A significant illustration of this is found in Braithwaite’s discussion of safety culture within Qantas and other airlines, when he compares crews from different parts of the world and within different national carriers. Firstly, though, before discussing such a question it is important to look at what ‘culture’ means in the argument. Again, Reason (1997: 192) covers this rather well and states succinctly that ‘whereas national cultures arise largely out of shared values, organisational cultures are shaped mainly by shared practices’. However, this, taken as doctrine, would imply that regardless of which airline, where in the world, or dominant social culture, implementing a practice such as giving out incident reports or briefing on safety procedures will always lead to a better safety culture. Although this rather oversimplified things, looking back, as Reason looked back at Uttal (1983), it can be seen that an organisational culture is ‘shared values (what is important) and beliefs (how things work) that interact with an organisation’s structure and control systems to produce behavioural norms (the way we do things around here)’.

Braithwaite (2001) proposed that Australian crews may well have success in a safety environment due to their openness to speak out against each other regardless of ‘seniority’. This may well be the opposite of well-established Far Eastern airlines or those from countries that can be considered as having a far more ‘hierarchical’ social culture. In these organisations superiors are significantly more autocratic with their subordinates, signifying a large power divide. This has been associated with poor crew resource management in certain parts of the world (Harris 2008). However, safety in these more rigid environments has been attributed to a high level of sticking to the SOPs (Standard Operating Procedures), which may also be relevant to the Qantas model of aviation safety. It can be seen yet again that these areas have many unanswered questions, but the major foreseeable problem with this, and even the studies that stand so far, are that they rely on input from the pilots in the form of questionnaires or interviews, both of which struggle to get the response levels that would provide truly reliable information or scope for the entire airline/nation/environmental culture.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 45

It is possible to argue, as Reason does, that safety culture (or any other aspect of organisational culture, for that matter) is something that an organisation ‘has’, not ‘is’. Consequently, the organisation is malleable and facets may be added to or subtracted from such a culture in order to improve it. It is these facets that need to be identified and examined as to whether they will be integrated with other cultural facets to work in a positive manner. Indeed, Braithwaite (2001) has an interesting discussion of the need for a comprehensive set of system safety indicators similar to the Flight Safety Foundation’s controlled flight into terrain (CFIT) check list. This would need to be a proactive and simple check list-style procedure to assess an organisation and look for obvious areas that may need improvement. Alternatively, the facets may be proved better than the previous believed norm, at any level, so the improvements can be replicated in other organisations, or at least argued for or against. Reason (1997) again points out that prescriptive feed-forward only methods can never fully control safe behaviour. Therefore, use of such a system safety check would allow a continuous feedback process for the development of the safe culture within an organisation. This would require processes to be carried out periodically and all individuals made to feel that all comments or reports (made, for example, through incident reporting) are welcome and acted upon. The problem is in the level at which this system is aimed at: if it is too general its usefulness will be limited, but if it is too fine-grained it may become prohibitive to use. The essence of the argument is for ‘loops within loops’, integrating levels of the system through feedback (Figure 2.6).

Figure 2.6 Illustration of interconnected feed-back and feed-forward loops through levels of a system

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention46

Intrinsic to any organisation is the presence of a blame culture whether it be a true blame culture a no-blame culture, or a culture positioned between the two (such as the ‘just’ culture). Certainly for the development of incident reporting programs, blame is particularly important. This is also the case in the author’s argument for a redressing of the balance of emphasis that has led to active error playing second fiddle to latent pathogens in investigations and prevention of incidents; that is, it is not the intention to return to a period of blaming and shaming individuals for their actions, but merely to concentrate on the ‘whys’ and ‘hows’ of an individual’s actions. Aside from this, though, blame has been a major factor in moving the concentration of investigations away from the individual, even though attribution to individuals at the sharp end, according to Reason (1997), deflects the blame from the organisation as a whole. It is contended, in fact, that an organisation cannot simply uncouple culpability of an event from itself to an individual; an event is an internal factor and a part of the organisation from which blame should not be sought to be removed. It has often been the culture to blame individuals due to the satisfaction, to conclude an investigation when looking at an organisation may involve a never-ending search, and also to fulfil our possible innate requirements. However, this must be avoided if a high level of safety culture is to be gained. Blaming an individual and acknowledging the effect of an active error by an individual, or group of individuals, should not be one and the same; they should be treated as mutually exclusive for the purposes of investigation. Only if terms such as ‘gross negligence’ are fulfilled should one then look to sanction an individual; beyond that, training and correction must come first. After all, it is known that all activities have some form of inherent risk, therefore it would be unfair to punish an individual for falling for this risk. The emphasis must be not on blaming and not on ignoring the fact that an individual can have a profound effect on accident causation, but on the ‘whys’ and ‘hows’ at a cognitive and organisational level, that is to say ‘a systems view’.

4 Incidents: A Tool for Proactive Safety

Central themes of this research coming to the surface in the review thus far have included the comparison and use of both commercial and general aviation and the need to address dominant linear models. Additionally, in reviewing the literature and in particular the practical use of methods relating to aviation accident investigation there appears to be scope to increase the use of non-accident scenarios to understand the aviation system better. In particular, feedback has been discussed as an important issue within an organisation that will be safe, and incident reporting programs are one area where this usefulness can be capitalised on.

Incidents, or ‘near misses’, can be defined as ranging from partial penetration of defences to situations in which all the available safeguards were defeated but no actual losses were sustained. As such they cover a whole multitude of sins and would hopefully therefore promulgate a mass of information for safety development.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 47

Section 4 sets out to discuss the way in which industry is currently working with accidents. It will also address the issue of using incidents for safety management and, more particularly, understanding incidents more fully with information networks.

4.1 What Place for Incidents?

In his editorial to a special edition of Interacting with Computers, Chris Johnson (1999) highlights an important factor in Human Factors, a ‘bias towards major accidents’. He goes on to suggest that this is a barrier to the uptake of Human Factors in industry and the full utilisation of techniques. These are the high- profile, high-cost (in monetary and life terms) accidents that cover the front pages of the newspapers and can send the public into a frenzy. Thankfully, the majority of ‘events’ involving human error result in an incident (or even a non-event) rather than a catastrophic accident and these incidents can be considered by many to be ‘free lessons’. Reason (1997) spends some time discussing the idea that safety awareness and defence are highest post-accident. They may dwindle over time, as accidents are expected to happen to someone else and, as per his ‘surprise attribution’, not to happen to themselves. Ideally, incidents should be used to create a safer environment within these systems and organisations to a similar extent that accidents have been in the past. Safety systems such as the Confidential Human Factors Incident Reporting System (CHIRPS) or BASIS attempt to do this. Reason (1997) again rather cleverly relates incidents to inoculations that may be protective measures for a period of time after the event. This does not, of course, rely on useful interpretation, investigation and dissemination.

As the number of catastrophic accidents in aviation and other highly complex industries, decreases over time due to increases in safety, so the lessons that can be learned from them also decrease. This results in the awareness of what can happen also passing out of the current mindset. It must become normal practice that incidents do not only receive the cursory or routine attention in investigation that they have to date; they must instead be treated as the new ‘accident’, in investigative terms, until we fully learn all the lessons they may have to offer us. This movement to fully utilise incidents due to the falling accident numbers has been discussed since as far back in the literature as Rasmussen in 1988 yet we are still not fully taking advantage of the resource available to us. The concentration of investigation also must develop in order to truly utilise these incidents to their utmost. We must begin to look at where incidents went right and the positive circumstances that may be developed into preventative measures in the future. There are, of course, caveats to any statement such as this and care must be taken not to return to a ‘throw more regulations at it’ type attitude. Within this, it appears something is being done about safety, but such countermeasures may have negative implications for some aspects of work, or elsewhere within the system. Incidents may be a true step towards proactivity in safety management until such a time as fully proactive error prediction techniques, which are admittedly in their infancy, can make a larger contribution to complex systems.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention48

4.2 Common Cause

In order to get the most from incidents, this research intends to analyse them to the same extent as accidents (and even look to predict potential outcomes based on incident investigation) in order to ascertain whether lessons are being missed. However, before we can do this, another hurdle must be addressed. The whole premise on which the usefulness of investigating incidents to reveal information about accident causation is balanced is called the ‘common cause hypothesis’. This dictates that the causal pathway leading to an accident is fundamentally the same as that leading to an incident, or near miss, with the significant alteration of one or several factors that lead to a changed outcome (that is, not an accident).

This was first discussed in Heinrich’s 1931 book, Industrial Accident Prevention. Although there has been some work to assess the validity of such a common cause, it appears as if the similarity of causal pathways for incidents and accidents has become confounded with issues of severity and frequency of incidents and accidents (Wright and Van der Schaaf 2004). Perhaps the basis of this confusion is in the way Heinrich and others (see, for example, Bird and Germain 1996 and Salminen et al. 1992) carried out ratio-based studies in order to try and prove or disprove the relationship between incident and accident. In fact, in Wright and Van der Schaaf’s 2004 paper they go on to discuss the confusion in all the studies they found (of pertinence) that suggested an argument for, or against, the common cause theory. In other words, the theory itself has never really been proven and yet it is the basis for so much near miss-based work.

Dekker and Hollnagel (1999) several times allude to an evolution towards failure that releases precursor events to an accident and signal the vulnerability of the system. They suggest that the sequences of events leading up to an incident or accident are almost identical and they share traits of human–automation breakdown. This holds firmly in view the common cause for accidents and incidents in the most complex of socio-technical systems and human–computer interactions. Indeed, there appears to be much empirical evidence (although the link is not proven or attempted to be proven) to suggest that correct investigation of incidents could prevent accidents. The 1994 analysis by Woods et al. of a 1992 Strasbourg crash showed that a previous British Airways incident had many of the same details, but the information did not get to those who needed it in order to prevent the accident. This is one of many articles in the literature that illustrate the previous existence of incidents that did not have a catastrophic end, yet accidents followed with too many of the same precursors. Indeed, this again emphasises the communication issue that the details of incident investigations and reports need to be directed to the correct people. Working with Orasanu and Connollys’ (1993) concept of naturalistic decision making, we can see that people tend not to bring situations back to first principles for knowledge-based behaviour when it is novel. Instead, they rely on comparisons with previous encounters, and from this we can surmise that the more knowledge of previous events we can give our crews, the higher the possibility that comparisons may be correct.

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 49

This is not to say that all cases of remedial action following on from an incident result in a totally positive outcome. Reason (1997) discusses a couple of case studies where this is not true, including one from Three Mile Island in 1979. There are also cases in aviation, such as that involving the introduction of takeoff monitors during the 1950s after a number of accidents on contaminated runways. This device directed pilots to the ‘correct’ takeoff attitude for the conditions in order to resolve problems identified in previous incidents when flight crews demanded the incorrect attitude. After the failure of a retaining screw within the unit, an unusually high angle of attack was commanded by the device and, following the instructions, the flight crew stalled the aircraft and crashed back onto the runway. This illustrates succinctly the danger of merely introducing new guidelines, practices or regulations but not actually identifying and preventing the problems and causes of accidents. As Don Norman (1990) concludes, if analyses are isolated so the improvements that result from them may also be isolated, they potentially lead to new problems at the system level.

The study by Wright and Van der Schaaf (2004) worked at a very general level and in very limited conditions within the railways using data from the Confidential Incident Reporting and Analysis System (CIRAS). It is, however, the only paper of its kind in a search of the literature that appears to address the actual assumption of common cause. This does introduce the idea of further significant work possibly using BASIS, CHIRPS and other reporting systems in different industries to get a multi-level study of common cause presence. This would give a clear grounding to further work in promoting the use of incident data to prevent accidents.

There are proponents, such as Dekker (2005), of the banality of accident theory. This is the suggestion that incidents do not in fact precede accidents in systems safer than 1 in 10−7 events. Before this point he suggests that indeed incidents may be precursors of an event and useful as such. Dekker says that normal work precedes accidents in these states and this in itself goes against the ideas discussed by Reason and those defending common cause. Leveson (2002) refers to papers by Edwards (1981) and Kjellen (1982) claiming ‘data on near-miss (incident) reporting suggest that causes for these events are mainly attributed to technical deviations while similar events that result in losses are more often blamed on human error’ (Leveson 2002, 21). This however, may be due to reporting practices and characteristics of human reporting procedures rather than true causation difference. This is a different view that may be answered with further study on the core of common cause theory.

4.3 Realising the Potential

We have already mentioned a diminishing return through accident investigation alone. Accidents are rare events leading to a population size that is not good enough for any form of statistical analysis to show the state of the system (Waldock 1992). It could be more useful to start treating incidents in this way, ‘incident’ becoming the new ‘accident’. Westrum’s (1992) organisational cultures illustrate

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention50

quite clearly that those organisations with a generative nature would be most likely not just to collect incident data and reports, which is not enough in itself, but also to act on them appropriately. They would also treat those making the report fairly and in a positive manner (notwithstanding the discussions centred around blame or misconduct).

Incidents, when properly examined, may also allow us to get past the hurdle of hindsight. A chain of events has occurred that may be interrogated and the individuals are still present to discuss their feelings and position at the time, so we can actually ask what and who was aware of what at any time. Incidents contain the negative factors, and this allows positive investigation returns without the destructive end. Although Reason (1997) feels accidents are indeed necessary for the development of an organisation, it is contended that the answer should be no; if incidents are given the precedence they received in the past 20 years or so, then the number may be reduced and the safety of the industry increased as a whole.

Reason (1997) goes on to discuss how redundancy in systems may hide mistakes or errors. This, he suggests, may be negative to safety evolution within an organisation as the ability to learn from them is diminished. However, this is also true whereby incidents, if not used to their full extent, also makes these complex systems more opaque to users. Information that may prove useful if attacked in the right way is being wasted. One cautionary note at this point, however, is that if incident data are to be utilised fully, and so shared in the public domain for researchers and other companies to view, they must not form the basis for a new ‘league table’ of safe airlines, crew, or industries, as this will have only negative effects on the reporting side of incident data.

In order to glean the highest standard of information from our incident data we first need a system for collecting and investigating data. This is far from simple as the literature discussions to date have pointed out. In recent years we have seen the development of some very exciting and useful incident reporting tools such as the UK Civil Aviation Authority (CAA)’s CHIRPs, the US National Transportation Safety Board (NTSB) Aircraft Incident Reporting System (AIRS) and British Airways’ BASIS programs, to name but a few. These systems are an amalgamation of data, with voluntary and mandatory reporting procedures for incidents and accidents set out by national authorities or airlines themselves. In order to gain as much information as possible, systems such as BASIS use voluntary reporting forms from pilots, then engage the crew by telephone; or no-blame inquiries, as in Qantas’ Identifying Needed Defences In the Civil Aviation Transport Environment (INDICATE) proactive safety program (Braithwaite 2001).

In using open-ended and specific questions, investigators try to work out not just what happened but also why. This ‘why’ has become more important to these systems in recent times as the data are increasing. As a result, the data were useful for a while, showing trends of what is occurring but struggled to show why an event or situation may have occurred. Concentration on the latter may lead to a better safeguard than event logs per se. CIRAS, the confidential reporting system for the UK railways (Davies et al. 2000), has been struggling with large quantities

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 51

of data but limited information on why the events are occurring. Some of this may be due to the way in which the comparatively immature reporting system is approached by employees in the railways and may develop in time. Here a development towards looking at the bigger picture of the safety of the system and what the reports are telling the analysts is crucial. This is not to say that it would be an easy task with such a large number of data, and a possible entire culture shift being required.

4.4 Reporting Issues

At the heart of these reporting systems, as already mentioned, is normally the word ‘voluntary’. Even in ‘mandatory’ schemes, the crews involved are relied upon to not only ‘own up’ to an event or ‘error’ but also to be prepared to think about how, what and why this happened so that lessons can be learned. This is asking a lot from commercial airline crews, especially as they already feel their time is under pressure. Many see these duties as more paper work taking them away from their principal flying role. There is also the issue of embarrassment that Braithwaite (2001) picks up on so clearly.

The idea of blame, discussed in section 3.2, is even more prevalent in a society where ‘voluntary’ means possibly putting yourself forward for ‘the chop’. De-identification of the individuals involved is the simplest step taken towards encouraging individuals to come forward. It is important and widely acknowledged in the literature (and thus feeding into organisations slowly) that although there is emotional satisfaction in blaming individuals, there is little or no effect on future fallibility of an individual who is blamed. Again, we are unable to change the human condition, but must work with it. There is still a long-standing question regarding the legal position, especially since, as Braithwaite (2001) points out, litigation is a serious business in the US (and it is now also becoming overused in the UK and around the globe). Much of the literature around this area at first encouraged the position of a ‘no-blame’ culture. However, this can be just as negative as a ‘blame’ culture. With ‘no-blame’ there can be no comeback or remedial measures taken for truly culpable acts of negligence, or acts extremely removed from training. This is where the idea of a ‘just’ culture was developed. It was correctly recognised that although a system was needed to encourage the reporting of incidents, where individuals were truly responsible or negligent action could be taken, whether it be remedial training, merely a union representative talking to the crew involved (as in BASIS after any important reports submitted), or even legal proceedings. There are bases in the literature such as Neil Johnston’s substitution test (1996) to assess whether blame is a fair or necessary element to a situation. In this psychological test, it is suggested that where an individual’s actions are judged to be possibly unsafe acts in relation to an event, the individual concerned is replaced by someone of similar qualifications and experience. The following question is then posed: ‘In the light of how events unfolded and were perceived by those involved in real time, is it likely that this new individual would have behaved any differently?’

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Human Factors Models for Aviation Accident Analysis and Prevention52

If the answer is ‘probably not’, then Johnston asserts that ‘apportioning blame has no material role to play’. This allows a simple comparison between peers and allows for a just analysis of the need for blame or not.

There is also important cautionary recognition throughout the literature that indeed the most poorly reported incidents, whether the reporting failure is due to embarrassment or non-detection, may well be the most dangerous ‘latent pathogens’ out there in the system. Indeed, Dekker (2005) elaborates at great length about the difficulties of relying on individuals to recognise their own mistakes. This is particularly striking when working in a complex environment where everyone believes they are doing normal work and nothing special is occurring. Even an outsider viewing a ‘mistake’ may not be correct, in that it is a mistake to them, but inside the head and ‘world’ of the individual carrying out the activity it is a normal action at that time. This is an interesting area in the attempt to use incident data, which may be worth investigating further, together with its implications on incident reporting. This is a new idea compared to the work of Woods (1984) showing how individuals or their co-workers in nuclear industries identified the errors they had made. If there is an innate difficulty in identifying errors, especially at the rule- and skill-based levels, then this is going to leave incident and accident investigation with an almost unconquerable chasm in attaining our goals. In a similar vein, Guastello (1996) cited a Swedish study which illustrated a need to train individuals in what to report and what not to report in terms of what actually is an incident, that is, the quality of reporting is just as important as the quantity.

The story does not end there, however, and the literature has examples of the need for action on whatever reports are made in order to encourage future reporting by crews or individuals (Dekker and Hollnagel 1999).

4.5 The Future for Incident Use

Once again, though, even if we do collect enough of the right information and are able to apply thorough investigative techniques it is effectively useless unless that information is communicated to those who matter: individuals within organisations at all levels. Indeed, beyond simply disseminating the information within the company, Christopher Hart, Federal Aviation Administration (FAA) Assistant Administrator for System Safety, ‘believes that the only way of further reducing airline accident rates is the sharing of safety information’ (Male 1997: 24). Inherent in this sharing of information, though, would have to be an evolution in inter-company relations. This would particularly be with respect to commercially sensitive or confidential information and data. Safety cannot afford to take a back seat to company secrecy – but this is, of course, an idealistic view.

I think it is also time for the industry to begin really concentrating on the elements in incident reports that lead to successful outcomes; these are a truly positive and proactive element to investigating near misses. Reason (2008) also believes it is time to return to the ‘heroes’ of events, rather than treat all involved

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .

Modelling a Dynamic World 53

as ‘villains’. Johnston’s substitution test again has scope for application here. For example, would, another pilot have saved flight UA232 (an engine failure resulted in damaged and unusable flight controls, leaving the crew to use the two remaining engines for control), given the same circumstances. What is the heroic element either of individuals who avert accidents or of a system that does so? As appears to be standard in this research, though there are warnings, for example from Habberley, Shaddick and Taylor (1986: 50) suggesting that near misses resulting in a successful outcome may cause crews to be more daring in the future and believe they can ‘get away with it’. The crux of the argument with incidents is that organisations in high-risk situations must develop a ‘learning culture’ and continually reflect upon their practices through monitoring and feedback (Pidgeon and O’Leary 2000). This requires flexibility of organisations, which comes right back to the opening arguments of the ‘tug of war’ between cost/production and safety. Braithwaite (2001) discusses how Qantas allow their crews access to practise using onboard simulators while they are not in use for training. This, he points out, shows that pilots have a natural quality in wanting to improve and be the best, so flexibilities like this facility can make a very big difference within the organisation. This determination to improve, I would argue, has even overtaken the feeling discussed by Beaty (1995) that safety is viewed as a feminine attribute, discouraging pilots from taking it too seriously. This, linked with improved dissemination of the result of these investigations and analyses, will aim to improve the accident rate beyond its almost 20-year plateau. As Reason puts it: ‘Errors arise from informational problems. They are best tackled by improving the available information – either in the person’s head or in the workplace’ (1997: 154).

5 The Influence of Modern Technology

In the past 20 years or so automation has had significant effects on the role and job of flight deck crew, but despite this, the accident level, as we have seen, has remained relatively constant. With the advent of increased automation and redundant safety in complex flight systems, the role of the pilot is often quoted in the literature as becoming more of a monitoring role removed from the physical control of the aeroplane. Indeed, through this increase in automation (and possibly a significant factor in encouraging it) is the decrease in slips made by the human actor in the system. This, however, comes at a price, as the errors made tend to be of a higher order and therefore can be much more serious. There is still not enough known about the issues of new ‘sharp-end’ technology and their effect on a system. More concerning than this is the reluctance of commercial aviation manufacturers to fully integrate and adopt Human Factors perspectives in products (although this may be partially due to the fact that designs tend not to be ‘new’ but rather ‘developed’ from past models) (Harris 2007 and 2009).

C o p yr

ig h t ©

$ {D

a te

}. $

{P u b lis

h e r}

. A

ll ri g h ts

r e se

rv e d .