2001
Le travail humain
Cultural variation of perceptions of crew behaviour in multi-pilot aircraft
[1]
H.-J. Hörmann
Hans-Jürgen Hörmann, German Aerospace Centre (DLR), Department of Aviation and Space Psychology, Sportallee 54A, 22335 Hamburg, Germany.
E.mail: HJHoermann@ compuserve. com.
Agissant comme “dernière ligne de défense”, les pilotes de ligne doivent souvent contrer les effets délétères sur la sécurité liés aux pannes ou aux situations inattendues. Une coopération au sein de l’équipage est absolument nécessaire pour détecter et traiter ces problèmes avec toute l’efficacité requise dans le temps imparti. Toute la communauté aéronautique s’accorde sur ce point, mais le débat est nettement plus ouvert sur ce que doit être une coopération idéale, sur les aptitudes à mettre en jeu et sur ce qu’il convient d’enseigner dans la formation professionnelle. Les effets culturels sont souvent évoqués comme origine de ces différentes approches de résolution des problèmes. C’est pourquoi une étude sur l’impact culturel de l’évaluation professionnelle de la coopération au sein de l’équipage a été réalisée dans le cadre d’un projet de recherche européen (DGXII) appelé JARTEL (Joint Aviation Requirement Translation and Elaboration of Legislation). 105 instructeurs européens, issus de 14 compagnies différentes représentant 12 nations, ont participé à une évaluation professionnelle sur vidéo d’une série de 8 scénarios montrant un large éventail de comportements en vol d’équipages professionnels. Les résultats ont été analysés en testant les hypothèses de différences culturelles suggérées par le travail de Hofstede (1980, 1991), qui classe les cultures notamment en fonction du degré d’autorité (Power Distance) et de l’individualisme des acteurs. Les principaux résultats ne vont pas dans le sens d’une sensibilité culturelle nationale. Les jugements des instructeurs des différents pays sont relativement convergents. Les différences sont nettement plus marquées entre compagnies (d’un même pays) et entre instructeurs ayant des niveaux différents de maîtrise de l’anglais. Des différences plus marquées pourraient cependant exister avec des équipages de l’Europe de l’Est. En résumé, les travaux réalisés, même s’ils sont encore à affiner et à confirmer par de nouvelles études, montrent, dans ces métiers de haute technologie, l’importance relative de la culture nationale par rapport aux effets importants de culture locale et d’entreprise.
Mots-clés :
Gestion des ressources de l’équipage, compétences non techniques, sécurité aérienne.
As the “last line of defence” pilots in commercial aviation often have to counteract effects of unexpected system flaws that could endanger the safety of a given flight. In order to timely detect and mitigate consequences of latent or active failures, effective team behaviour of the crew members is an indispensable condition. While this fact is generally agreed in the aviation community, there seems to be a wide range of concepts how crews should interact most effectively. Within the framework of the European project JARTEL the cultural robustness of evaluations of crew behaviour was examined. 105 instructor pilots from 14 different airlines representing 12 European countries participated in this project. The instructors’ evaluations of crew behaviours in eight video scenarios will be compared in relation to cultural differences on Hofstede’s dimensions of Power Distance and Individualism.
Keywords :
Crew Resource Management, Non-Technical Skills, Cultural Effects, Aviation Safety.
Investigations of accidents and incidents in aviation have repeatedly identified effective crew behaviour as the crucial element for safe aircraft operations. According to the model of accident causation developed by J. Reason (1997) and Maurino, Reason, Johnston, and Lee (1995) latent and active system failures can become overt and create a critical situation by coincidence with missing or flawed system defences and specific triggering factors in the environment. In such a scenario the pilots and other crewmembers have to comply with their roles of being principal actors in the final stage of a drama initiated by a fatal chain of hazardous events. As the “last line of defence” their technical proficiency as well as their non-technical skills to perform efficiently as a team become essential for preventing damage or catastrophic loss of life and property. Hence, systematic training of non-technical skills (NTS) in the form of Crew Resource Management (CRM) courses has been recognised by the aviation industry as an important complement or part of the pilots’ technical proficiency training.
While, there is little disagreement about the general necessity and contents of CRM training as such, controversies commence quickly if detailed training syllabi, instruction methods or desired team behaviours have to be determined. Views about good manners of co-operation or leadership styles vary substantially between people, organisations, and cultures. For example, driven by individual attitude patterns, company policies, and cultural norms different practises emerge about how to use authority with subordinates, how to treat conflicting opinions, how to assess risks, or how to comply with checklists and procedures. Empirical evidence is reported by Helmreich and his research group that the cockpit is not a culture-free place (Helmreich & Merritt, 1998; Merritt, 1996). National, organisational, and professional culture of crewmembers manifest themselves in the airline’s safety culture, which holds standards and norms for safe crew behaviour. Presently, non-technical skills are still neglected in evaluation and debriefing of the pilots’ performance in regular check situations. Consequently, investigations of the effectiveness of CRM-training programs as well as the systematic reinforcement of desired individual behaviour patterns cannot be done adequately.
The area of NTS evaluation has become increasingly important in the light of recent efforts by the Joint Aviation Authorities (JAA) to harmonise requirements for aircrew licensing and training within Europe. While the assessment of NTS is indicated in the present codes of the Joint Aviation Requirements (JARs), the regulations do not give recommendations on how NTS should be evaluated, and which NTS should be included in that framework. Therefore, the JAA-Project Advisory Group (JAA-PAG) tasked four research institutes (NLR Amsterdam, DLR Hamburg, IMASSA Brétigny, and the University of Aberdeen) in 1997 to develop a NTS assessment system that became known as NOTECHS (van Avermaete & Kruijsen, 1998; Flin, Goeters, Hörmann, & Martin, 1998). The JARTEL (Joint Aviation Requirements and Translation Elaboration of Legislation) project was born out of the NOTECHS project, with the aim of assessing the usability and the reliabilty of a set of behavioural markers established in NOTECHS through both experimental and operational evaluation. Non-Technical Skills can be considered “as those skills referring to all pilot’s attitudes and behaviours in the cockpit not directly related to aircraft control, system management, and Standard Operating Procedures (SOPs)” (van Avermaete & Kruijsen, 1998, p. 15). Classic examples of NTS are labelled as co-operation, communication, team building, conflict solving, error management, workload management, decision making, attention, or assertiveness. Based on a review of the most influential behavioural marker systems currently in use in Europe and in the United States, as well as associated relevant research findings, the NOTECHS consortium decomposed NTS into two Categories for social skills (Co-operation, Leadership & Managerial Skills) and two Categories for cognitive skills (Situation Awareness, Decision Making). Based on the observation of crew behaviour, NTS evaluations can be conducted on levels of increasing generality, which are shown in Table 1: distinct behaviour sequences, related skill Elements, or summarising skill Categories. Respective ratings by the instructor pilot finally lead to behaviour reinforcement or to recommendations for further training.
As Europe is a multi-cultural environment, the issue of cultural differences and their impact on flight crew behaviour, and hence non-technical skills, is fundamental to the JARTEL project. In fact this project is in line with recent trends in the scientific literature on Industrial and Organisational Psychology to consider culture as a critical variable for the generalisability of models and concepts for different countries (see Gelfand, 2000). The influence of culture on pilots’ behaviour has been widely reported (Helmreich & Merritt, 1998; Helmreich & Wilhelm, 1997; Johnston, 1993; Maurino, 1994; Meshkati 1996; Phelan, 1994). CRM training programmes developed in one country and then exported to another have often proven less effectiveness (Yamamori, 1987). While difficult to identify, cultural factors are considered to have an important influence on pilots’ behaviour. Indeed Meshkati (1996) believes that “operators’ culturally driven habits are a more potent predictor of behaviour than their intentions”, and hence they need to be taken into consideration for the assessment of any tool for evaluating pilots’ skills. Therefore, within the JARTEL project the NOTECHS method and its behavioural marker system was tested for cultural robustness and validity in a number of different cultural regions in Europe.
One of the main difficulties when people start discussions about culture is that the term itself is open to broad definition. For the purpose of this study culture is defined as “the norms, attitudes, values, and practices that members of a nation, organisation, profession, or other group of people share” (FAA HF Team, 1996, p. 117). Since culture is transmitted through all sorts of interpersonal interactions it becomes an important factor in CRM which is based on interactions among crew members. Culture as “the shared way of life of a group of people” (Berry, Portinga, Segall, & Dasen, 1992, p. 1) influences how we communicate with each other, how we delegate or accept orders from others, how different opinions are negotiated, how decisions are made, and so on. While safety and efficiency of flight operation are universally accepted as desired outcomes of CRM, the behaviour patterns that lead to these outcomes might vary substantially with cultural norms. Even within the European aviation community it is uncertain whether the cultural variety can be accounted for in a single codex for safe crew behaviour.
In order to examine whether the NOTECHS method can be regarded as an appropriate tool for the evaluation of non-technical skills in the different European JAA member states, an extensive literature review was carried out during the first phase of the JARTEL project (Hörmann, Fletcher, & Goeters, 1998). The aim was to identify stable and meaningful dimensions of cultural differences and to locate cultural clusters with similar norms and values related to crew behaviour. Initially, three relevant studies were found, which describe dimensions of national cultural variation in Europe or beyond. They are mainly based on reported work-related values of male employees (Hofstede, 1980, 1991; Helmreich & Merritt, 1998; Smith, Duggan, & Trompenaar, 1996). Empirical evidence is provided in these studies that national cultures vary along three general dimensions related to the interactive processes within working groups, such as flight crews: Individualism vs. Collectivism, Power Distance, and Uncertainty Avoid- ance. Hofstede defines these dimensions as follows:
Individualism-Collectivism (IND) refers to the relation between the individual and the group. In individualist societies (like Great Britain or Scandinavia) personal choices and achievements are favoured over continuing membership to a specific group. Implications of behaviour are seen only in a narrowly defined area of personal costs and benefits (Helmreich & Merritt, 1998). On the opposite end of this scale are the collectivist societies (like Portugal or Turkey) where group membership is foremost and people form, and are part of, strong cohesive groups which take precedence over individual goals. When being embedded in such a group, open conflicts are avoided, in case of disagreement solidarity striving and harmony become important.
Power Distance (PD) is defined by Hofstede as the extent to which the less powerful members within a culture expect and accept that power is distributed unequally (1991, p. 28). On the individual level power distance can be seen in terms of the amount of respect and deference between superiors and subordinates. In countries with lower power distance (like Denmark or Ireland) subordinates feel less dependent on high rank colleagues. They prefer consultation and, if necessary, contradiction. In countries with higher power distance (like France or ex.Yugoslavia), there is considerable dependence of subordinates on superiors, and subordinates are unlikely to approach or question their superiors directly (Merritt, 1996).
Uncertainty Avoidance (UA) can be defined as the extent to which members of a culture tend to feel threatened by uncertain or ambiguous situations (Hofstede, 1991). An emotional need to resolve ambiguity quickly and to leave as little as possible room to chance is seen as a common behaviour attribute in countries with high uncertainty avoidance (like Greece or Portugal). This translates into levels of stress and the desire for predictability through adherence to written or unwritten rules.
In an attempt to group European countries on these dimensions, five different cultural clusters were identified in the JARTEL project with characteristic profiles on IND, PD and UA as shown in Table 2. Data for the often-neglected East European countries could be supplemented recently from two further cross-cultural surveys that included Russia (Bollinger, 1994; Naumov & Puffer, 2000).
TABLE 2 :
Clusters of national culture in Europe
Grappes de cultures nationales en Europe
This attempted culture mapping certainly contains some shortcomings mainly due to inconsistent results between studies or distinct national subcultures (e.g., Belgium, Switzerland). Possible discrepancies were tried to resolve by giving higher weight to Helmreich and Merritt’s (1998) study as their data was also drawn from a population of pilots. For the purpose of the JARTEL experiment the derived typology was regarded as the best approximation towards an a-priori clustering of cultures in Europe. It is assumed that pilots from countries within one cluster have more cultural values in common than pilots from different clusters. However, without further empirical evidence the proposed classification does not claim being the appropriate concept for other studies. From the viewpoint of the JARTEL consortium sufficient amount of cultural variation for testing the cultural robustness of the NOTECHS system is provided, if each of the five clusters is represented by at least one “marker” country.
Apart from national culture, organisational culture might also influence the views on “ideal” CRM-related behaviour. In order to gain more experience concerning the impact and significance of organisational factors on crew standards, each cultural cluster was represented by two different company types: a national flag carrier and a smaller regional airline. Alternative classifications that position airlines instead of countries on any of the dimensions for national or organisational culture could not be found in the sifted published literature. Since too detailed information on the companies’ management or training philosophies could be threatening for the participating airlines, it was decided to accept this fairly rough split into national flag carriers and smaller regional airlines as a second independent factor besides national culture for variations in evaluations of CRM behaviour.
In the light of expected European requirements on the evaluation of NTS, the proposed NOTECHS system should be robust against differences in national culture. This means that different nations are expected to have equivalent standards for assessing crew behaviour. To test this as- sumption, it was decided to recruit an approximately equal number of instructor pilots from larger and smaller airlines in each of the five clusters. The experimental sessions were carried out in classrooms using video scenarios filmed in a Boeing 757 simulator at British Airways (BA). After certain written material study and a half-day briefing, all par- ticipating instructors rated eight scenarios that show typical examples of crew interaction and problem-solving in the cockpit. Several ques- tionnaires were also completed by the participants, to provide statistical data on their background, cultural grouping, experience, and attitudes towards their profession. If significant differences in understanding CRM concepts across cultures did exist, NTS-evaluations in a sample of instructors should vary in relation to their cultural background. On the other side, differences between types of organisations are acceptable as long as they are not diminishing safety aspects. More details about the experimental protocol can be found in Delsart (2000).
Prior to carrying out the experiment, it was necessary to develop the training videos to be used in the experiment, and to establish a method for calculating an expert benchmark or reference to calibrate each scenario on the four NOTECHS Categories.
II . 1. DESIGN OF THE VIDEO SCENARIOS
The scenarios to be used in the experiment were filmed in a Boeing 757 simulator, with the Captain and the First Officer (F/O) played by British and Italian pilots. Air traffic controllers and cabin attendants were played by BA employees. Language of conversation was English throughout all films. Eight scenarios were used in the main experiment, chosen from fifteen that were filmed. The scenarios were designed by BA and DERA specialists to demonstrate a range of realistic situations showing good and poor practice across the NOTECHS Categories. A set of design references was produced for each scenario stipulating the levels of NTS that the pilot actors were supposed to illustrate. Extreme examples of behaviour represented in a merely cartoon-like style were avoided. A brief outline of the eight scenarios used in the experimental sessions is given below:
(1) Descent- the F/O is flying. A passenger problem is reported by the cabin crew. The action centres around the Captain allowing himself to be distracted by secondary events and not monitoring the F/O’s actions. The altitude bust that concludes the sequence is the direct technical consequence of the F/O mis-setting the cleared flight level, but the Captain’s behaviour precipitates the error.
(2) In cruise over Brussels- 170 miles to destination London Heathrow. After suffering an engine fire, the Captain decides to continue to destination against the good advice of the F/O to land as soon as possible.
(3) Crew carrying out pre-departure checks. The F/O is unfamiliar with the airfield and receives little or no support from the experienced Captain. The F/O remains confused.
(4) Top of descent- an electrical failure occurs. Problem well handled by both pilots working as a team.
(5) Approach and landing in very gusty conditions. The Captain is very supportive of the underconfident F/O and achieves a very positive result after good training input.
(6) A night approach in the mountains. Captain decides to carry out a visual approach through high terrain and triggers a ground proximity warning because of disorientation. F/O takes control and prevents an accident.
(7) An automatic approach in instrument weather conditions (CAT III). Very good standard operation. An example of a typical everyday flight deck activity with both pilots contributing to a safe outcome.
(8) Joining the holding-pattern awaiting snow-clearance. The Captain persuades the F/O that they should carry out a visual approach with an illegally excessive tail-wind for commercial reasons. The F/O points out to the Captain that he disagrees with his decision.
A training video was also produced that contained an introduction to the project background and the NOTECHS method. It gave explanations and definitions of the concept of Elements and Categories followed by short video examples of NTS behaviours. After each brief scene, pauses were given to facilitate discussions. For further practice in using the method two more complex scenarios were actually rated at all levels.
II . 2. REFERENCE RATING
A set of reference data was required for the analyses in order to examine rater accuracy. Two independent groups of pilots (three from British Airways and five from Lufthansa) with thorough experience in the instruction of Line-Oriented Flight Training and NTS evaluation assessed the eight test scenarios. Each group came up with a consensus rating for each of the Categories and Pass/Fail judgements.
In the minority of cases where the British Airways and Lufthansa groups showed discrepant ratings, the design reference was consulted to determine the appropriate rating. The design reference was the behaviour specification from the original script that the pilot actors in the scenarios were supposed to demonstrate.
II . 3. PARTICIPANTS
Fifteen experiment sessions were run involving 105 male instructor pilots from 14 different airlines across Europe. However, one pilot did not take part in the evaluation of the test scenarios, and another only completed the score forms for scenarios 1 to 4. To test the national cultural robustness of the NOTECHS method the instructor pilots were recruited from companies located in five different cultural clusters identified in Europe. In addition, it was decided to recruit an approximately equal number of instructor pilots from major and from smaller, regional-sized companies to examine effects of the organisational culture on the NOTECHS method (see Table 3).
TABLE 3
Number of participating instructor pilots in the different cultural clusters
Effectifs des pilotes instructeurs participants selon les différentes grappes culturelles
II . 4. PROCEDURE
Groups of raters recruited from one company participated in the experiment during a full day standardised session. All participants were already briefed about the background of the experiment and about the NOTECHS method by written material distributed in advance.
After arriving, the raters received a short introduction to the JARTEL experiment and were asked to fill in a background questionnaire to gather data about their professional background- such as age, nationality, flying experience (flying hours), exposure to different kinds of CRM training (yes-no), experience with NTS evaluation (yes-no), and English language proficiency (1=poor, 2=moderate, 3=good). Besides the company type other aspects of organisational culture were also included in this questionnaire. For example, whether the company regularly provides reports on Human Factors (HF) issues (yes-no). A short form of the Flight Management Attitudes Questionnaire (FMAQ, Helmreich & Merritt, 1998) was administered to tap Hofstede’s dimensions of Power Distance, Individualism, and Uncertainty Avoidance. On the country level Hofstede’s computational method was used to derive these scores. On the individual level the FMAQ-scale “Command Responsibility” was used as an attitude measure towards unequal distribution of power in the cockpit. Low scores on this scale reflect less distance between the Captain and his crew. Communication is openly initiated in both directions. High scores are related to high Power Distance with less communication initiated by junior crew and greater unquestioned reliance on the Captain (Helmreich & Merritt, 1998, p. 77).
The raters then received training in the NOTECHS method and instructions for using the method during the experiment. This briefing was carried out in a controlled manner using the training video described above and an interactive question and answer session. A number of points were discussed and clarified, ranging from the need to observe both pilots throughout the scenarios and not to over-concentrate on the Captain’s behaviour, to the importance of treating each actor as a different character, if he appeared in more than one scenario. It was also pointed out that the raters should try, where possible, to disregard their own company procedures and rules when judging the behaviours in the videos. Where a breach of an Standard Operating Procedures (SOP) was intended to be significant it would be mentioned by the actors in some direct way. At the end of the training video, raters further practised using the NOTECHS system to rate two more complex scenarios.
In the afternoon session, the eight test scenarios were shown. The participants rated the Element, Category, and Pass/Fail levels for both the Captain and F/O after each scenario. It was decided to use a five-point rating scale at the Elements and Category level to allow the raters to distinguish between different gradations of NTS qualities (1=very poor, 2=poor, 3=acceptable, 4=good, 5=very good). However, from the pure licensing point of view a crewmember’s performance is either acceptable or unacceptable. Therefore, additionally a two-point pass/fail scale was used in the experiment, but is not included in this study. Not observed ratings were allowed on each scale level in case of absence of behaviour that was not relevant to a particular situation and therefore not seen. The average inter-rater reliability was estimated to be .76 at the category level (O’Connor, Hörmann, Flin, Lodge, & Goeters, in press).
At the end of each experimental session, the raters filled in an Evaluation Questionnaire, which contained 16 multiple-choice and open questions about their opinion on the NOTECHS system and the experimental method. Last, open discussions were conducted for de- briefing on general feelings, to achieve knowledge on the context and collect qualitative data for the understanding of the results.
All rating and questionnaire data were coded and arranged into a database by the University of Aberdeen in Scotland.
The distribution of the background variables within and between the five cultural clusters is shown in Table 4.
TABLE 4 :
Cluster means of background variables and FMAQ scales across the cultural clusters. Percentages are related to the proportion of affirmative answers for the respective item (n.s. =not significant, * =significant at 5% level, ** =significant at 1% level)
Moyennes des variables de base par grappe culturelle et échelles FMAQ selon la grappe culturelle. Les pourcentages correspondent aux proportions de réponses affirmatives aux items respectifs (n.s. =non significatif, * =significatif au seuil de 5%, ** =significatif au seuil de 1%)
Before looking at differences between the NTS-ratings across the cultural clusters, the rater groups were compared on the basis of relevant background information, such as flying experience, language proficiency, CRM experience, etc. If, in addition to their cultural background, the national clusters differ on further not directly culture related aspects, then between-group differences cannot be attributed to culture alone. For ordinal and categorical data (items 4 to 11) the effects were tested for significance with the χ2 statistic, while analysis of variance (ANOVA) was used for age, flying hours and the FMAQ-scales.
As can be seen in Table 4 the pilots were comparable across the cultural clusters on items related to general pilot experience (No. 1 to 3) as well as to their general exposure to Human Factors issues and CRM training (No. 4 to 8). In these two item groups only one effect was significant (No. 8). The five clusters did not deviate significantly with respect to experience as participant in different types of CRM training. However, there were significant differences for culture related variables (English proficiency and Command Responsibility) and activities as CRM instructor (No. 9 and 10). The raters in NW Europe had clearly more pre-experience with NTS evaluation than the others. 91% participants in cluster 2 mentioned that they were familiar with NTS evaluations before the JARTEL experiment. They had also received more training as CRM instructors (81%).
Analyses of the FMAQ scales led to another interesting finding. While all pilots seemed pretty much homogeneous in regard to Individualism (No. 13) and Uncertainty Avoidance (No. 14), significant differences on the Command-Responsibility scale were found. According to Helmreich and Merritt (1998) the Command scale comes closest to Hofstede’s dimension of Power Distance. Apparently, the raters in SC Europe and EA Europe favoured stronger leaders as Captains than the others. The preferred authority gradient in the cockpit seemed to be slightly steeper in these clusters than in Scandinavia, NW Europe, or SP Europe. However, it must be emphasised that the clusters are probably not entirely representative for the culture they tap or for a certain airline. On the other side these results support to some degree the suggested a-priori clusters. The differences in IND, though not significant, showed some correspondence with the expectations in Table 2. Merritt (1996) found similar results in her dissertation study. Command Responsibility was the strongest determinant of differences among airline pilots, whereas scores for IND and UA were higher and more homogeneous in a pilot population than in Hofstede’s original sample.
The next stage of analysis led to the comparison of the NTS evaluations between the cultural clusters. For each of the four NOTECHS Categories sixteen ANOVAs were conducted (2 crewmembers (Captain and First Officer) and 8 scenarios). With national culture in form of the five clusters as the independent factor more than half of the main effects (55% out of 64 analyses) were statistically significant (see Table 5). Compared to national culture the effects of variables tapping organisational culture were negligible. Only 9% of the main effects of company size and 11% of the main effects for the provision of Human Factors reports were significant. These results seemed to indicate that national culture determines the evaluations of CRM behaviour to a high degree. However, as was said above, this factor was confounded with other variables that should be balanced before conclusions about cultural differences can be drawn.
To receive an estimate for the actual cultural effects on the NOTECHS ratings, two-factorial analyses were conducted which combine national culture with additional variables as shown in Table 5. Company type, provision of HF reports, NTS-rating experience, and English proficiency were entered together with national culture in two-factorial ANOVAs, while the FMAQ Command-scale was utilised as a continuous covariate in the ANOVA with national culture as second independent variable. If the variation of the NTS-ratings is balanced for differences in these five additional factors, the direct impact of cultural differences on the ratings can be estimated more adequately. As shown by the percentages in Table 5 cultural effects were reduced from 55% to only 23-25% in the 2.factorial AN(C)OVAs. Especially, attitudes towards the command responsibility of the Captain and English language proficiency were significant sources of variance between the five clusters.
TABLE 5 :
Percentage of significant main effects in one- and two-factorial ANOVAs
Pourcentages des effets principaux significatifs dans les analyses de la variance à un et deux facteurs
In order to illustrate the quantity and direction of cultural differences the within-cluster means for each Category and crewmember position were compared with the reference ratings. Figures 1 and 2 show the absolute differences between the participants’ ratings for Captains and F/Os and the reference for the respective Categories aggregated over the eight scenarios. These two charts give an impression of how large or small the cultural effects actually were. As all deviations were within the range of plus/minus 1 grade on the rating scale, this finding confirms that the accuracy of all ratings in general was quite substantial (O’Connor et al., in press). For the Captains, ratings of Co-operation had the smallest deviation from the reference, for the F/Os Decision Making had the smallest deviation scores. Across the five clusters instructors from NW Europe and Scandinavia came closest to the reference ratings (see Table 6). The NTS-evaluations for the F/Os were even more accurately related to the reference than those for the Captains.
Fig. 1. Absolute deviation scores for Captains over the eight scenariosScores en écart absolu pour les commandants sur les huit scénariosFig. 2. — Absolute deviation scores for F/Os over the eight scenariosScores en écart absolu pour les co-pilotes sur les huit scénarios
In Figures 3 and 4 the direction of potential cultural effects is shown. The values are the relative deviation scores (average differences between cluster means and reference rating) for Captains and F/Os over the eight scenarios. A negative value corresponds to a general trend to be more strict with the respective NTS-skills in comparison to the reference (“underestimation”), a positive value corresponds to “overestimation”. Looking at Figure 3 for the Captains, their Leadership skills appeared in a positive light over all clusters, whereas Situation Awareness was seen more negatively than in the reference ratings. The F/Os’ NTS were generally seen more critically by all raters, especially the social Categories Co-operation and Leadership. Co-operation and Leadership skills of F/Os were seen less positively in all clusters, whereas Situation Awareness and Decision Making were “underestimated” only by raters from clusters 3, 4, and 5. Raters from South Central and South Peripheral Europe as well as from Eastern Europe were most critical about the F/Os’ NTS-skills (see Table 6).
Fig. 3. Relative deviation scores for Captains over the eight scenariosScores en écart relatif pour les commandants sur les huit scénariosFig. 4. — Relative deviation scores for F/Os over the eight scenariosScores en écart relatif pour les co-pilotes sur les huit scénarios
TABLE 6 :
Average absolute and relative deviation scores against the reference ratings. Absolute deviation scores show the average quantity of “rater bias”, relative deviation scores show the direction of “rater bias”
Scores en écart moyen absolu et relatif comparés aux évaluations. Les scores en écart absolu correspondent à la moyenne du “biais d’évaluateur”, les scores en écart relatif montrent la direction du “biais d’évaluateur”
An example may further illustrate the results. In the first video scenario a rather directive Captain managed a medical problem with one of the passengers, which was reported by a Senior Cabin Crew Member during descent. The F/O as the flying pilot became overloaded with additional tasks (like radio telephony) and dialled in a wrong altitude with the Mode Control Panel. The Captain failed to monitor the F/O’s actions and an altitude violation resulted. The scenario ends with the Captain criticising the F/O for his poor performance. The majority of raters from SC Europe and EA Europe judged this altitude bust more as a problem of the F/O, who made the error, while the Captain’s NTS were rated as acceptable. On the other side, most raters from Scandinavia, NW Europe, and also SP Europe evaluated the Captain’s Co-operation, Leadership and Situation Awareness as well as the F/O’s Leadership and Situation Awareness with poor. The latter three clusters also had the lowest scores in Command Responsibility. There seemed to be a relation between different views of the situation and Power Distance. Correlation coefficients between the FMAQ-scale Command Responsibility and the NTS evaluations were all highly significant for the Captains in scenario 1 but not for the F/Os. Coefficients varied between .18 (for Decision Making) and .45 (for Situation Awareness) for the entire sample of instructors. The average correlation was .34 for the Captains and .17 for the F/Os. This means the NTS of the Captain were evaluated more positively by instructors who scored higher on Command Responsibility.
Finally, some qualitative data are reported about the different views of the instructors on the NOTECHS system and the conduct of the experiment. In the Evaluation Questionnaire participants were asked to give their opinions about the consistency and comprehensiveness of the NOTECHS system, its usefulness and the applicability of the five-point rating scale. With the χ2 statistic no significant cultural differences were detected in regard to any of these aspects. The division of non-technical skills into four Categories and 15 Elements was accepted by most raters and did not vary with cultural differences. The high number of raters (88%) across all cultural clusters, who were equally satisfied with the four Categories and 15 Elements of the NOTECHS system indicated that the proposed decomposition of NTS had a high degree of cross-cultural acceptance and usability.
In the context of the task for JAA to harmonise requirements and regulations for pilot licensing and training, national cultures and cultural differences became troublesome entities. Europe is in a transitional economic period. As in other fields of industrial and organisational research (Gelfand, 2000; Pearce & Frese, 2000; Triandis, 2000) a need for further cross-cultural studies, especially including East European countries is also identified for the aviation industry. Efforts to establish common standards for a European license have to take national characteristics of different cultural regions into account. Attention was drawn to cultural issues especially in the area of Multi-Crew Cooperation and Crew Resource Management training (Helmreich & Merritt, 1998). While in the 1980ies CRM was perceived as a set of culture-free principles with quasi-universal validity for enhancing safety, evaluation studies in the 1990ies have shown that CRM training outside the “culture comfort zone” of the trainees is less effective and accepted (Merritt, 1996). Whichever way the issue is addressed, national culture has per definition a direct impact on attitudes and values of individuals from any given culture. Therefore, it was expected that instructors from different European cultures would perceive crew behaviour in multi-pilot aircraft differently and might assess what they have seen according to different standards.
To test the NOTECHS system for cultural robustness in Europe is one of the central research questions of the JARTEL project. If the experiment did reveal substantial disagreement about good or bad practice of flight crew interaction and co-operation, the standards for NTS evaluation had to be calibrated for cultural effects. With the proposed five-cluster model for national culture as an independent factor we found 55% of the main effects to be significant in a series of ANOVAs. However, a closer inspection of the group mean scores revealed that the differences are only gradual, as they preferably vary between the scale values of “very poor” and “poor” or “acceptable”, “good” and “very good”. O’Connor et al. (in press) reported that 81% of all 105 participants in the JARTEL experiment match the reference ratings if the five-point scale was collapsed to a dichotomous acceptable versus unacceptable rating. This finding illustrates that not much variation is left which could be accounted for by culture.
Most of the intercultural effects occurred for scenario 7, which was supposed to show a clear standard performance of an automatic precision approach in poor weather conditions. In this scenario the crewmembers do not communicate with each other very intensively, because all actions are thoroughly carried out in accordance to the procedures. However, to completely grasp the situation, full comprehension of the English conversation is crucial. When the self-assessed variable of English language proficiency was combined with the cultural factor in two-factorial ANOVAs, all cultural effects disappeared for this scenario and instead a number of main effects for language arose. In fact, differences in English proficiency seem to be a prominent source of variance that is almost as strong as the differences in national cultural. Only 23% of the cultural effects remain significant when English language is entered as another independent factor. Similarly, national culture seems to overlap with prior NTS-evaluation experience of the instructor pilots and with scores on the FMAQ-scale Command Responsibility. When included in the analyses, these factors also reduce the number of cultural effects substantially. While Command Responsibility is related to Hofstede’s dimension of Power Distance, which is in itself an aspect of cultural differences, experience with NTS evaluation and language are factors that can be influenced by training to level out different perceptions and standards of crew behaviour. If NOTECHS were to be applied in the native language of the instructor pilots, as will be the case in operational use, and if a more intensive training period as in JARTEL was provided, cultural effects with this evaluation method should almost disappear. A further operational validation phase of the JARTEL project was recently started to clarify among other aspects on the language issue of NOTECHS (Polo, 2000).
The remaining effects that were found here even after controlling influences of the background variables are distributed rather un- systematically over categories, scenarios and crew position. In general, the evaluations of the F/O’s behaviour are seen more critically compared to the reference ratings than that of the Captains, especially in clusters 3, 4, and 5. In some scenarios a correlation was found between the NTS evaluations for the Captains and the FMAQ-scale Command Responsibility. Over all videos the average correlation is .18 for the Captains and –.07 for the F/Os and categories. As elaborated for the first scenario, participants with higher scores on Command Responsibility (e.g., from South Central and East Europe) seem to blame primarily the F/O, who made an error by setting the wrong altitude. The Captain’s behaviour was perceived as acceptable, though he created unnecessary strain through poor workload management and also should have detected the error by timely monitoring. Within the concept of CRM as error management strategy the Captain’s behaviour should be seen unacceptable. From the high correlations between Command Responsibility and the NTS evaluations of the Captain in this scenario we conclude, that instructors with higher Power Distance (like in cluster 3 and 5) tend to focus their NTS evaluations more on obvious errors of the individual crewmember than on behaviour styles that are centred around avoiding and detecting errors as well as mitigating potential error consequences. However, this conclusion assumes that all participants were in fact exposed to the same amount of CRM training as the data in table 4 suggests.
Aspects of organisational culture have only a minor influence on ratings with the NOTECHS method. Systematic effects either of company size or of the regular availability of human factors reports on NTS evaluations can be discounted. The expectation that organisational culture would have a stronger impact on NOTECHS ratings than national culture cannot be confirmed by the data of this study. Summarising the analyses of effects of national and organisational culture, it can be concluded, that the decomposition of NTS into Categories and Elements as in the NOTECHS system has a high degree of cross-cultural acceptance. Effects of national culture appear to be only marginal on the five-point scales of the Category level. Provided that language proficiency of the users was on an equal level and appropriate familiarisation took place, the NOTECHS method in most aspects can be regarded as robust against variations of national and organisational culture in Europe.
It is not the intention of this paper to disregard cultural influences on crew interaction and teamwork in general. The available literature on cross-cultural research reports many counterexamples. Perhaps, commercial airline pilots are not prototypical members of their national cultures. The professional culture of many airline pilots seems to bear some unifying principles of cockpit work. Highest safety standards are a common goal of all professional aircrews. Most of their flight related activities are in accordance with SOPs, which are designed by a few aircraft manufacturers and influential airlines. Furthermore, the training of many European instructors and pilots takes place in a limited number of training centres attached to or linked with only a few global players in this training industry. Therefore standards and values of flight-related activities are more homogeneous within the population of pilots.
Another important and even less researched issue is that of mixed-cultural crews. With the coming European license for airtransport pilots, in future it will become more commonplace to operate aircraft with multicultural crews in the cockpit and in the cabin. Already some of the larger carriers are basing parts of their flightcrews in different parts of the world. Customers profit from cultural synergy if they can communicate with flight attendants in their native language. Also communications with local ground staff at the airports can be facilitated by crewmembers with multicultural background. However, as the crew complement is often changing from one flight to the next, the tasks of teambuilding and maintaining can become more demanding. Communication barriers could already arise when briefings have to be conducted in a second language or when cultural constraints are mistreated by lack of respective awareness, tolerance, or competence. In order to manage a flight safely and efficiently the requirements for crew co-operation should be clearly defined in advance. As long as safety is involved, every crewmember should have the same concept of desired behaviours and actions. These concepts of crew behaviours must be adequately trained, continuously practised, and consequently assessed and reinforced throughout all levels of the respective company. The NOTECHS system provides an applicable framework of NTS behavioural markers, which has proven its reliability and sensitivity to evaluate CRM-behaviour in a quasi-experimental study of the JARTEL project. Cross-cultural comparisons have shown that the assessment procedure, the material, and the standards are sufficiently tolerant and robust in consideration of cultural influences on CRM behaviours. In the next project phase of JARTEL the NOTECHS method will be used in real operational settings of different airlines’ training departments in order to further evaluate its practicability and shape this methodology for its implementation in coming guidance material of JAA.
Paper received: October 2000.
Accepted in modified form: April 2001.
·
Berry, J. W., Portinga, Y. H., Segall, M. H., & Dasen, P. R. (1992). Cross-cultural psychology: Research and applications. Cambridge, UK: Cambridge University Press.
·
Bollinger, D. (1994). The four cornerstones and three pilars in the “House of Russia” management system. Journal of Management Development, 13, 49-54.
·
Delsart, M. C. (2000). Non-technical skills assessment in pilot training: experimental plan of the JAR-TEL study. Paper presented at the 24th European Association of Aviation Psychology Conference. Crieff, Scotland, September.
·
Federal Aviation Administration (1996). Federal Aviation Human Factors Team Report on: The interfaces between flightcrews and modern flight deck systems. Washington: DC Federal Aviation Administration.
·
Flin, R., Goeters, K.-M., Hörmann, H.-J., & Martin, L. (1998). A generic structure of nontechnical skills for training and assessment. Paper presented at the 23rd European Association of Aviation Psychology Conference. Vienna, Austria, September.
·
Gelfand, M. J. (2000). Cross-cultural industrial and organisational psychology: Introduction to the special issue. Applied Psychology, 49, 29-31.
·
Helmreich, R. L., & Merritt, A. C. (1998). Culture at work in aviation and medicine: National, organizational and professional influences. Aldershot, UK: Ashgate.
·
Helmreich, R. L., & Wilhelm, J. A. (1997). CRM and culture: national, professional, organizational, safety. Paper presented at the 9th Aviation Psychology Symposium. Columbus, OH: April.
·
Hofstede, G. (1980). Culture’s consequences: International differences in work related values. Beverly Hills, CA: Sage.
·
Hofstede, G. (1991). Cultures and organisations: Softwares of the mind. London: McGraw-Hill.
·
Hörmann, H.-J., Fletcher, G., & Goeters, K.-M (1998). Synthesis of cultural aspects and their influences on crew behaviour. JAR TEL WP1: Final report (Report, DG VII. JAR TEL/DLR&DERA/WPR/1/03). Paris: Sofréavia.
·
Johnston, N. (1993). CRM: Cross-cultural perspectives. In E. L. Wiener, B. Kanki, & R. L. Helmreich (Eds.), Crew resource management (pp. 367-398). San Diego, CA: Academic Press.
·
Maurino, D. E. (1994). Crosscultural perspectives in human factors training: Lessons from the ICAO Human Factors Program. International Journal of Aviation Psychology, 4, 173-181.
·
Maurino, D. E., Reason, J. T., Johnston, A. N., & Lee, R. B. (1995). Beyond aviation human factors. Aldershot, UK: Avebury Aviation.
·
Merritt, A. C. (1996). National culture and work attitudes in commercial aviation: A cross-cultural investigation. Unpublished doctoral dissertation, University of Texas, Austin, TX.
·
Meshkati, N. (1996). Cultural factors influencing safety need to be addressed in design and operation of technology. ICAO Journal, 51, 17-28.
·
Naumov, A. I., & Puffer, S. M. (2000). Measuring Russian culture using Hofstede’s dimensions. Applied Psychology, 49, 709-718.
·
O’Connor, P., Hörmann, H.-J., Flin, R., Lodge, M., & Goeters, K.-M. (in press). Developing a method for evaluating crew resource management skills: A european perspective. International Journal of Aviation Psychology.
·
Pearce, J. L., & Frese, M. (2000). Introduction to the special issue on transitional economies in Eastern Europe. Applied Psychology, 49, 613-618.
·
Phelan, P. (1994). Cultivating safety. Flight International, 146, 22-24.
·
Polo, L. (2000). Practicability of NOTECHS in regular airline training. Paper presented at the 24th European Association of Aviation Psychology Conference. Crieff, Scotland, September.
·
Reason, J. (1997). Managing the risks of organisational accidents. Aldershot, UK: Ashgate.
·
Smith, P. B., Duggan, S., & Trompenaars, F. (1996). National culture and the values of organisational employees. Journal of Cross-Cultural Psychology, 27, 231-264.
·
Triandis, H. C. (2000). Cross-cultural I/O Psychology at the end of the millennium. Applied Psychology, 49, 222-226.
·
van Avermaete, J. A. G., & Kruijsen, E. A. C. (1998). The evaluation of non-technical skills of multi-pilot aircrew in relation to the JAR-FCL requirements (Report: NLR-CR-98443). Amsterdam: EC NOTECHS project.
·
Yamamori, H. (1987). Optimum culture in the cockpit. In H. W. Orlady & H. C. Foushee (Eds.) Cockpit Resource Management Training, NASA Conference: Publication 2455. (pp. 75-87). Moffett Field, CA: NASA Ames Research Centre.
[1]
This study is part of the JARTEL project carried out under contract with the European Commission, DG-TREN by Airbus, Alitalia, British Airways, DERA, DLR, IMASSA, NLR, Sofréavia, and University of Aberdeen.