ADVANCED MMIS TOWARD SUBSTANTIAL REDUCTION IN HUMAN ERRORS IN NPPS
- DOI : 10.5516/NET.04.2013.700
- Author: SEONG POONG HYUN, KANG HYUN GOOK, NA MAN GYUN, KIM JONG HYUN, HEO GYUNYOUNG, JUNG YOENSUB
- Organization: SEONG POONG HYUN; KANG HYUN GOOK; NA MAN GYUN; KIM JONG HYUN; HEO GYUNYOUNG; JUNG YOENSUB
- Publish: Nuclear Engineering and Technology Volume 45, Issue2, p125~140, 25 Apr 2013
This paper aims to give an overview of the methods to inherently prevent human errors and to effectively mitigate the consequences of such errors by securing defense-in-depth during plant management through the advanced man-machine interface system (MMIS). It is needless to stress the significance of human error reduction during an accident in nuclear power plants (NPPs). Unexpected shutdowns caused by human errors not only threaten nuclear safety but also make public acceptance of nuclear power extremely lower. We have to recognize there must be the possibility of human errors occurring since humans are not essentially perfect particularly under stressful conditions. However, we have the opportunity to improve such a situation through advanced information and communication technologies on the basis of lessons learned from our experiences. As important lessons, authors explained key issues associated with automation, man-machine interface, operator support systems, and procedures. Upon this investigation, we outlined the concept and technical factors to develop advanced automation, operation and maintenance support systems, and computer-based procedures using wired/wireless technology. It should be noted that the ultimate responsibility of nuclear safety obviously belongs to humans not to machines. Therefore, safety culture including education and training, which is a kind of organizational factor, should be emphasized as well. In regard to safety culture for human error reduction, several issues that we are facing these days were described. We expect the ideas of the advanced MMIS proposed in this paper to lead in the future direction of related researches and finally supplement the safety of NPPs.
Nuclear Power Plant , Human Error , Man-Machine Interface System , Automation , Operation and Maintenance Support , Education and Training , Safety Culture
A report by Institute of Nuclear Power Operation (INPO) says that about 48% of the total events in nuclear power plants (NPPs) for 2 years (2010-2011) are from human errors . A Korean database, OPIS (Operation Performance Information System), also reports more than 20% of the total events in Korean NPPS during the past 10 years (2002-2011) are from human errors . The most severe NPP accidents in history such as TMI-2 and Chernobyl are also strongly related to human errors [3,4]. Human errors come in many types. Based on a survey of recent events from the Korean NPPs, human operators make mistakes due to working conditions that are unclear or not very friendly with unfamiliar human machine interfaces, fatigue, carelessness, and various reasons . Then, how can we reduce human errors in existing NPPs? There can be many, but five are suggested in this work. The first is to increase automation so that humans do not have to access the systems too much. The second is to improve man machine interface systems (MMISs) which include developing operation/maintenance support systems. The third is to improve operating procedures for operators or maintenance manuals for maintenance crews. This improvement is not only making procedures/manuals correct but also making them human friendly. The fourth is to improve education and training of operation and maintenance crews. The education and training for workers from outside are more needed than for the in-house workers. It is because workers from outside are not usually educated well enough to work in NPPs. The education and training can also be technical or non-technical. Technical education and training is obvious, while non-technical education and training for improving skills such as leadership, communication, and team cooperation, is equally important. Last but not least is to enhance safety culture. The importance of safety culture cannot be emphasized enough. Management has tendency to emphasize the economical generation of electricity rather than safety. They may push fellow workers hard to accelerate the work process and reduce the overhaul period, for example, due to economic reason. “Safety first” may not be observed adequately in the real world. This kind of insight for reducing human errors in existing NPPs can be applied to the design of future NPPs. Future NPPs which are not only highly automated but also equipped with various operation support systems and many maintenance support systems based on a variety of the stateof- the-art information technologies (IT) can be thought to be resistant to human errors. Also, the future NPPs in which the workers are very well educated and trained, and have strong safety culture, will be even more resistant to human errors. These kinds of future NPPs can be called intelligent or smart future nuclear power plants and considered to be human error resistant.
Korea carried out a national project, called KNICS (Korea Nuclear Instrumentation and Control Systems), to localize the digital hardware which are used in NPPs during 2001-2010. However, this project is mainly for developing hardware. Only very small amount of software was developed through this project. Many software are needed to operate nuclear power plants economically and safely. Developing automation systems, operator and maintenance support systems, and computerized procedure systems are activities to develop only a few of these types of software.
The MMIS in NPPs have evolved. The first generation, which started in 1950s and 60s, was fully analog. Old type computers were used, but only for data logging and an independent check of functions of the analog devices, neither for control nor protection functions. The second generation MMIS, which is still widely used is partially digitalized, but is only used for non-safety functions such as monitoring, not for safety or control purposes. The third generation MMIS, which is now being developed and introduced to NPPs, uses digital computers for control and safety functions. However, there are not many applications (software) to be used in NPPs yet. It is just the digital replacement of the existing analog MMIS. The fourth generation MMIS is expected to be the same as the third one in appearance, but it will be well equipped with application software, making it a very intelligent MMIS. We are in the process of going to the fourth generation from the third generation. And developing the automation systems, the operation and maintenance support systems, and the computerized procedure systems is part of the process.
In Chapter 2, we will be surveying the five possible ways of reducing human errors in more detail. In Chapter 3, however, we will be considering the first three ways of reducing human errors in this work, that is, we will be focusing on designing the advanced MMIS to reduce human errors excluding education/training and safety culture. The first way is to increase the automation. The second and third ways are to improve the human machine interfaces, but mainly on improving the support systems in operations and tests, respectively. The second way is to improve procedures for operations, but mainly focusing on developing computerized operation functions. Practically speaking, the computerized operating procedure systems are already being developed in many places including Korea Hydro Nuclear Power Company - Central Research Institute (KHNP-CRI) . The operator support systems functions which are suggested to be developed in this work can be developed separately or as a part of the computerized procedure systems. This decision will be made later. Therefore, we will not be discussing the development of computerized procedure systems in detail in this paper since they are already being developed in many places. However, for the case of implementing the operator support systems functions in computerized procedure systems, we will be discussing the possible way of implementing the operator support functions in the computerized procedure systems. The third way is to develop periodic test support systems by implementing a periodic test procedure platform, computerized periodic test procedures, communication network infrastructure, and services for administrative processes.
This chapter focuses on five issues in developing the fourth generation MMIS. The safety principle of the MMIS should be the same as that of NPPs, which is defense-indepth. Human errors must be inherently and systematically prevented, and the means to mitigate the consequence of human errors once they occur should be provided in a timely manner. The authors attempted to characterize the responsibility of the advanced MMIS in terms of objectives and means borrowing the concept of defense-in-depth in Table 1.
Inherent prevention of human errors seems impossible, but organizational factors can minimize the possibility. We consider safety culture as a top priority in designing the MMIS as well. Education and training can build qualified human infrastructure so that all personnel can perform adequate tasks no matter what situation happens in field. Of course, all the significant tasks should be clearly written in procedures, and operators or maintenance crews should follow the procedures under supervision for cross checking. Even though good procedures are present, human errors are likely to happen in an emergency condition due to high stress. Upon task analysis, the level of automation should be decided and the proper methods for automation should be provided. Finally, no matter what human errors or equipment failures occur, we should have the opportunities to recover them. The consequence of human errors may be soothed by the system’s inherent tolerance but the operator’s
recovery actions should follow up most cases. The intelligent operator support systems aim to provide the means to detect such human errors in a timely manner and to allow personnel to take appropriate actions without mistakes. The detailed issues to implement the essential means were summarized as follows:
From the definition of Wikipedia, automation is the use of machines, control systems, and information technologies to optimize productivity in the production of goods and delivery of services. Automation greatly decreases the need for human sensory and mental requirements while increasing load capacity, speed, and repeatability. The automation of NPPs pursues the reduction of manual tasks for accomplishing power generation and safety functions in NPP operations.
Automation is employed to achieve more reliable and precise control. This is positive from a reliability standpoint, since personnel are considered one of the more unpredictable components in the system. Thus, automation can enhance overall system reliability by removing or reducing the need for human action . Also, automation is required due to the following necessary aspects:
1) It can ensure the operation with accuracy, reliability, and operating speed that transcends the ability of the human.
2) It can substitute humans’ operation and work in an environment harmful to humans or not accessible to workers and operators.
3) By reducing the workload, the stress on an operator or the number of operators and workers can be reduced.
Human errors could be reduced radically by reducing the parts of man-machine interaction through improved and increased automation. However, excessive automation can lead to a lack of concentration for operators or can lead to side effects such as a lack of situational awareness. Figure 1 shows the work performance according to the workload . As shown in this figure, excessive automation reduces the work performance of operators. Therefore, it is required to achieve the appropriate level of automation.
It is important to understand the effects of poor automation design on operations staff; summarized below :
1) Automation can add to the overall complexity of the I&C (Instrumentation and Control) systems. If operators do not understand automation, it is difficult for them to properly monitor and supervise its actions.
2) When functions and tasks are performed not by
operators but automation, it is usually difficult for operators to remain aware of the function status.
3) Automation shifts operator workload from that associated with direct control to that associated with monitoring. When functions and tasks are automated, it can impact the ability of operators and workers to skillfully perform them.
4) Since automation works very well most of the time, operators trust in it. They can sometimes become complacent and not monitor its performance effectively.
Thus, in addition to designing automation itself, it is crucial to design the human-system interaction as well. An understanding of the impact on human roles and responsibilities can help design more effective that avoids many of these negative effects. It is notable that as more sophisticated digital automation techniques will be developed along with improved automation platforms, the level of automation could be increased.
The major features of the fourth generation MMIS are summarized below:
？ Advanced computer and software technology
？ Advanced digital control and protection technology
？ Enhanced automation of operation and maintenance
？ Improved sensors
All these characteristics are associated with NPP automation. The third and fourth features are directly related to NPP automation.
Figure 3 shows the typical activities for achieving functions and tasks. Operation staff (in manual control) or automatic systems must continuously monitor the plant through sensors to detect the corresponding conditions when functions and tasks are required to be performed. They must assess the situation and plan a response. Once the response plan is made, they implement the plan by
sending control signals to actuators.
The four generic activities are addressed to pursue the automation of NPPs :
？ Automation of monitoring and detection
？ Automation of situation assessment
？ Automation of response planning
？ Automation of plant control
1) Automation of monitoring and detection
Current safety and control variable measurements depend on conventional processes and radiation sensors including:
？ Temperature (resistance temperature detectors, thermocouples)
？ Pressure (diaphragm, piezoelectric)
？ Flowrate (pressure difference through flow restrictor)
？ Neutron flux (fission chamber, ion chamber)
Also, there are a relatively small number of smart sensors, equipped with some signal processing techniques. However, the newest instrumentation systems incorporate many pieces of information at the lowest level, leading ultimately to highly processed information through integration and interpretation of this data. Greater intelligence is being built into individual sensors, allowing more selfchecking and self-calibration, and also providing the capability for multiple variables to be measured and combined to improve condition monitoring . Also, optical communication and optical sensors will be extensively applied in the future.
2) Automation of situation assessment
Situation assessment will be supported with automated diagnostics and prognostics capabilities. Diagnostics is typically used to identify and determine the causes of symptoms and determine mitigations for problems. Prognostics is focused on predicting the time at which a system or a component will no longer perform its intended function. Fast computing power, smart sensors, and data communications will be developed and then noble diagnostics and prognostics technologies will be implemented in NPPs in the future.
Through development in diagnostics and prognostics, operators will be able to predict future states of the plant and take action proactively, as opposed to monitoring the current state and reacting to change s or fault indications. Extensive computational capabilities will include the capability to run models and simulations faster than real time and will allow personnel to play “what-if” scenarios, or to replay events with tests of various hypotheses as part of diagnosis and response planning .
3) Automation of response planning
Human response to both normal and emergency situations is governed by plant operation procedures. Currently, the proper human responses to a situation are determined in advance and operators perform a proper response according to paper-based procedures. However, the new computer-based procedures (CBPs) enable many aspects of the response, including data gathering, assessing procedure step logic, determining action, and taking action, to be automated.
4) Automation of plant control
From the viewpoint of current NPP I&C situation, NPP I&C systems are not largely different from their very old basic design. NPP I&C systems yet maintains primarily single-input and single-output classical control structure to automate individual control loops.
A nuclear reactor is a complex system that requires highly sophisticated controllers to ensure that desired performance and safety can be achieved and maintained during its operations. Higher-demanding operational requirements such as reliability, lower environmental impacts, and improved performance under adverse conditions in nuclear power plants, coupled with the complexity and uncertainty of the models, necessitate the use of an increased level of automation in the control methods .
Two different kinds of tasks must be covered by a human operator: normal operation and emergency operation. Each kind of task includes various control actions. Generally, control functions are assigned to (1) personnel, (2) automatic control, or (3) combinations of personnel and automatic control . In consideration of the cases when unexpected circumstances happen, which are not suitable to be handled by a fully automated control, a human operator needs to take a role of plant control. Thus, in many cases, a human operator needs to operate NPPs with some aid of an information system. The operator’s roles are assigned as a supervisory role, manual controller, and backup of automation. The supervisory role monitors the plant to verify that the safety functions are accomplished. The manual controller carries out manual tasks that the operator is expected to perform. The backup of automation carries out a backup role to automation or machine control. A classical information-processing model  of human operators is represented by different states at which information gets transformed: (1) sensory processing, (2) perception of information, (3) situation awareness, (4) response planning, and (5) action execution. Of course there is a feedback to check the result of action implementation. Each state of information processing requires different information.
Working memory and attention are limited resources of human operators in the information-processing model. Information can be lost due to the loss of attention resources to keep it active, the overload of working memory, or the interference from other information in working memory. The limitation of working memory is closely related to the attention resources. A human-friendly design of MMIS will reduce the adverse effect of information overflow.
Because of this complexity, which is intrinsically imposed on the operators who have a limitation of his/her resources, the importance of MMIS is emphasized. In an advanced MMIS, thanks to the high performance of computer systems, a strengthened operator support system can be provided in addition to some automation of control functions. An advanced MMIS will support higher level functions of the human operator unlike the conventional ones. That is, the advanced MMIS will provide wellprocessed information and sophisticated operator aids in contrast to the conventional MMI, which focuses on the placement of hard-wired process parameters and control switches in an effective manner. This is one of the biggest advantages of the introduction of digital technology in NPPs. The MMIS will annunciate any abnormality in the plant to an operator, display more information to the operator, tailor displayed information to the operator's current needs, support the operator's decision-making process, and help the operator to recover from an incorrect action if there is any. This issue is closely related to the level of automations as described in the above section.
First of all, the clear situation awareness of an operator needs to be supported by the MMIS. During an accident, plant operators and technical support staff need access to plant status, and monitor plant response to the actions taken. The central safety problem in the design of an NPP is to assure that radioactive fission products remain safely confined at all times during the operation of the NPPs, refueling of the reactor, and the preparation and shipping of spent fuels  under possible accident conditions. The safety of an NPP will be in danger only if there are multiple failures of safety related equipment, serious human errors, or some combinations of these two conditions. That is, NPP operators will be greatly benefited if they are supported by advanced plant status information systems.
They also need to infer likely plant response to possible mitigative actions as well as the near-term plant behavior in the absence of mitigative actions. The use of a simulator will provide the proper means to satisfy these needs. The main task of a predictive simulator is to inform the staff about what will happen in the near future. Another task is to allow the operator to test mitigation strategies before they are actually carried out on the plant. To be of any use, the speed of such a predictive simulator must be faster than real time, actually much faster . This high speed requirement may induce the use of response-surface models rather than the physics-based models.
In summary, the successful management of this complicated accident situation, which must be performed by both human operators and computer systems, requires a successful collaboration of them. The MMIS should support the operators to diagnose the occurrence of an accident, to determine the extent of challenge to plant safety, to monitor the performance of automatic systems, to select strategies to prevent or mitigate the safety challenge, to implement strategies, and to monitor their effectiveness .
On the other hand, the malfunction of this MMIS causes a serious safety issue because if a safety system fails during an emergency, the plant will not be secured and catastrophic consequences might occur. Therefore, when designing safety systems, one often strives to achieve the lowest unavailability possible. Computerized operator support systems potentially have negative consequences without sufficient consideration of the strategies and, furthermore, may become a new burden on the operators . Experimental study  shows that the advanced MMIS may result in different effects on the operator performance with the level of operator expertise. Computerized operator support systems need to be consistent in both content and format with the cognitive strategies and mental models employed by the operator.
Procedure is recognized as one of the effective techniques to reduce human errors in NPPs as well as in other industries. The procedure normally outlines prescriptions of techniques or tasks that should be followed by personnel in the various situations. Procedures assist operators to make decisions in diagnosing the possibility of different kinds of system failure and can be helpful in avoiding biases often encountered in human information processing . Procedures can also reduce the omission of required actions in the situation or the intervention of unnecessary actions performed by plant personnel.
In a broad sense, the procedures used in NPPs include administrative procedures, operating procedures (normal, alarm response, abnormal, and emergency procedures), maintenance and technical support procedures, and testing and surveillance procedures . Administrative procedures describe how administrative aspects of NPP activities are carried out, such as review and approval of documents, training, and qualification of NPP personnel, and maintenance and retention of plant records. Operating procedures provide operators with the instructions that predefine actions taken in different operating conditions. Maintenance and technical support procedures relate to activities such as the conduct of preventive and corrective maintenance, radiation protection, and chemistry control. Testing and surveillance procedures relate to activities such as functional tests of safety systems, post-maintenance text procedures, and post modification procedures. Since the procedures form a principal interface between NPP personnel and the plant, they should be technically correct, comprehensive, explicit, and easy to use. In order to develop and maintain such procedures, NPPs are expected to apply (1) the measures for document control , (2) generic technical guidelines , and 3) accepted human factors, principles, and practices for the design of procedures . With the measure for document control, it is assured that procedures are reviewed for adequacy and approved for release by authorized personnel and distributed to and used at the location where the prescribed activity is performed. The generic technical guidelines that are usually provided by vendors include technical bases to develop procedures such as plant design bases, technical requirements, and specifications. Human factors principles and practices for the design of the procedures are also applied to ensure that procedures are comprehensive, understandable, and easy to use for operators. Those principles and practices for development and revision of procedures are normally established in a procedure writer’s guide. The procedure writer’s guide contains the introductions about structure, organization, contents format, and use of acronyms and abbreviations . Finally, for any procedure to be usable, it is important that each procedure should be verified and validated in an adequate way.
Procedures can be categorized into paper-based procedure (PBP) and computer-based procedure according to the means of implementation. PBPs are more popular in existing analog type NPPs. PBPs are a printed form of instructions, available in a place easily accessible by the personnel who carry out the task. However, PBPs have characteristics limiting the manner in which information is presented, and impose tasks upon operators that are not directly related to controlling the plant . Operator performance issues related to PBPs are well summarized in . Some of them specific to PBPs are:
？ Difficulty in managing multiple procedures simultaneously
？ Lack of context-dependent highlighting and navigation
？ Separation from other information sources such as safety parameter display systems
？ Potential for skipping a step and missing a procedure transition
By applying the ITs, CBPs are becoming a part of control room elements for new NPPs as well as modernized NPPs. CBPs are also implementing features that address to varying degrees some of the problems associated with PBPs. Some of the positive effects of CBPs on operator performances are reducing task completion time, workload, and operator’s errors in transition between procedures. CBPs can be designed to provide different levels of functionality, including varying levels of automation. Three different categories of CBPs can be defined according to the functionality provided . The Type 1 of CBP presents procedures on a computer driven display in text or graphical form with little additional functionality. This type is essentially a replica of PBPs. The Type 2 CBP incorporates additional functionality not found in PBPs, such as automatic retrieval and display of the information, automatic processing of step logic and display of results to support operator decision making, and access to soft control through links to a human-system interface system. This type of CBP does not include the ability to send control commands. The Type 3 CBP is more automated-systems. This type can include procedure-based automation, in which the CBP can automatically carry out multiple procedure steps when directed by the operator. More detailed description on the functionality of each type of CBP can be found in .
Although CBPs address many problems of PBPs, some human performance issues related to CBPs have been raised by several studies. One issue is the possibility that communication between operators may be reduced. Differently from PBPs, most of the information required for following procedure can be available in CBPs. Thus, the operator may not feel the necessity to communicate with the other operators to obtain the information. The potential for isolating a CBP user from the other operators may undermine team performance in emergencies. Another issue is that operators may not maintain situation awareness with highly automated CBPs. For example, with the Type 3 CBP, procedural steps can be executed by the system without an operator’s intervention. The operators may be out of the control loop and thereby lose good situation awareness about the plant. This is a typical human factors issue of highly automated systems. The third issue is the keyhole effect. The keyhole effect may become significant when operators are working with multiple procedures. If operators are so focused on a portion of the procedure that can be observed in the small space of display, they may lose a sense of orientation in the total set of active procedures. The fourth issue is the transition to the backup PBP in the case of CBP failure. If the CBP in use fails due to any reason, operators need to move to the backup PBP. This transition should not be complicated and the sufficient information for continuous operation with the backup PBP, e.g., the operation history up to the moment, should be provided for the operators. The means through which operators can recognize system failures, e.g., alarm or heartbeat display, need to be provided on the display system. More human factors issues and design guidelines with PBPs are given by the NRC and the EPRI (Electric Power Research Institute) [23,24].
Procedures need to be integrated into a coordinated effort with the other approaches for reducing human errors. First of all, procedures should be consistent with automatic systems and human-system interface. Procedures should include the interaction between automatic systems and operators. In this light, the procedures should define the operator’s tasks to monitor automatic systems and the tasks performed when the automatic system is malfunctioning. For the human-system interface, the information and control devices that are required to carry out the tasks in the procedures should be presented on the human-system interface in a proper manner. Second, a training program also needs to consider the inputs from procedure development to maximize effectiveness. The inputs are related to determining what procedures are required for training, who needs to be trained, or what the learning objectives of training are. For instance, tasks that were identified as problematic in developing procedures (e.g., procedural steps that underwent extensive revision) need to be included in the training program . In addition, the training program also needs to emphasize the procedures to handle the situations which the operators cannot nearly experience due to the low probability of situations happening, but which have the potential for a significant threat to safety. Examples of the procedures are emergency operating procedures and severe accident management guidelines.
The education and training of qualified and capable engineers should be one of the most significant issues during the entire life cycle of NPPs, which is also commonly shared in all countries.
The purpose of the education and training is to foster engineers’ fundamental capability to successfully perform their tasks without faults or errors by learning key knowledge and repeatedly practicing the procedures. However, we have to agree that procedures cannot cover all tasks which are necessary for normal as well as emergency conditions no matter how carefully the procedures are prepared and written as shown in the left of Figure 4. Additionally, it is not easy to guarantee that all engineers have a full knowledge of the procedures even though the education and training programs are provided. Therefore, we have to say that education and training may not deal with the real world as expected. Yet from another viewpoint, we can expect some positive potential. If engineers get to know more fundamental principles which are not explicitly presented in the procedures and their application, it is likely for them to wisely manage unforeseen cases which are not described in the procedures as shown on the right side of Figure 4.
During a normal condition, most tasks are routinely performed in such a way that the education and training for these tasks emphasizes procedure follow-up without mistakes. Even for conditions other than a normal situation, the procedures or guidelines are provided and operators exercise them repeatedly. However, there must be exceptional conditions which are not addressed in the procedures due to epistemic uncertainty or the lack of plant information. In this case, the procedures are of no use. For instance, a few white papers on the Fukushima accident in 2011 indicated the operators’ mistake not to take relevant actions in a timely manner because there was no corresponding direction in their procedures [25~27]. In conclusion, the point of the education and training can be summarized by reducing human errors during prescribed conditions and improving human decision-making during exceptional conditions.
There are a lot of case studies and lessons-learned published for better education and training. [28~32]. However, it does not seem that a certain standardized approach for education and training is present due to deviations in country, generation, and culture, which is different from other technical issues. Moreover, since we can recognize the outcome of education and training after a long investment, it is not easy for companies to keep continuous attention on this issue. Nevertheless it is obvious that the ultimate responsibility of nuclear safety should belong to humans no machines, so it is essential to secure a system for continuous quality improvement of education and training. Considering a few factors that we are facing, for instance, the growing concerns over nuclear safety, the active development of information and communication technologies, and specific circumstances in Korea, several issues are summarized below:
1) Education and training for subcontractors
For better business management, the tasks belonging to utility are getting transferred to subcontractors. Utility or main-contractors can have programs to foster the capability of engineers employed in the company, and engineers can also gain in-depth professional knowledge in the company through which career stability is relatively guaranteed. On the other hand, subcontractors are not likely to have enough essential education and/or training program participation. Their turnover is higher, so keeping qualified capability is not easy. Furthermore, engineers’ attachment to their job is normally different depending on their career status. It should be noted that main-contractors have to keep a minimum capability to deal with any abnormal situations even though there is no help from subcontractors in the event that they are not available. Considering this, main-contractors and subcontractors should be balanced in terms of securing qualified capability.
2) Education and training for tacit knowledge
Though there is a difference among countries, the current trend is that private inclination is becoming stronger than organizational devotion. As organizational devotion is stronger, the possibility of transferring tacit knowledge is more likely, which is sometimes more useful than explicit knowledge written in manuals or procedures. The ability to operate and maintain complex plant equipment requires a kind of knowledge that is not always known explicitly. The transfer of tacit knowledge is tapering off. This tendency is accelerated when junior members are not hired steadily. Frequent position circulation can be another reason. The primary remedy to overcome this situation should be to systematically transfer explicit knowledge. However the tacit knowledge seems to be indispensable. Education and training must take care of explicit and tacit knowledge in a balanced manner.
3) Education and training for lesson-learned from wrong cases
We can learn lessons from either good cases or wrong cases. Lesson-Learned from experience is a significant and effective method as education and training, but it is not easy to learn wrong cases in particular since those are not clearly reported in practice. Unless there is no obligatory rule, nobody wants to report wrong cases to the public. It is still fine that such wrong cases are circulated and referred to by all members within a company, but it can also be difficult when an individual plant, site, or branch is under competition with each other for business purposes. The causes associated with human or organizational errors such as inexpressibly private circumstances, team discord, and excessive workload are hard to document. Case reports usually describe accident scenarios and measures taken, but causes, including technical and non-technical issues, should be covered.
4) Education and training for non-technical factors
The non-technical factor mentioned in this section is related to ethical or psychological aspects of plant employees. After investigating unexpected shutdowns in Korea, many of the events were caused not by the lack of technical capability but rather psychological exhaustiveness. For instance, no matter how perfect a procedure is, a maintenance crew member skips a few steps at his or her discretion. While the quality of procedures is a pre-requisite, advanced systematic interlocks to prevent such mistakes should support plant personnel. More emphasis on non-technical factors is also necessary in collaboration with social science.
Safety culture was forged to describe the causes of the Chernobyl nuclear accident. Safety culture is organizational culture where everybody places safety first in every activity. Culture is for organization what personality is for the individual. Safety culture is invisible and intangible. All the processes and decisions, however, are influenced by safety culture. Many people ask what safety culture is and how safety culture is fostered. It looks difficult to devise a process to increase safety culture because safety culture does not stand alone. Safety culture is like an adverb rather than a verb. An adverb always comes with a verb. If safety culture is reflected in the statement ‘close a valve’, the statement becomes ‘safely close a valve’. A similar concept is found in the audit and quality assurance system. If individuals are armed with safety culture, the adverb ‘safely’ can be omitted to get rid of the written burden. Safety culture is sometimes misused to hide the responsibility of a human or organization, or the actual root cause of events. Safety culture sound ambiguous like political terminology.
Culture in principle has evolved while organization is adapting to social environment. Because organization consists of individuals, can culture be calculated by adding the sum of individuals' personalities? The answer is that culture forms a different domain from the sum of personalities. For example, the western culture has evolved respecting personal rights; whereas each person in the oriental culture has been trained to follow the culture shaped by leaders. In both cases, even weak human beings can build a strong safety culture by desiring ideal organization.
The IAEA (International Atomic Energy Agency) published guidelines on safety culture . The NRC, however, had been reluctant to introduce safety culture until corrosion was found at the Davis-Besse reactor head in 2002. The U.S. had believe that safety culture could be intimidated in the a closed society such as Eastern Europe in the past. The NRC has published the final safety culture statement in 2012 after intensive discussion. The Korean nuclear power industry, however, has not hesitated to introduce new concepts from both the IAEA and NRC.
After the Fukushima accident, Korean society has been keen on the operation of NPPs. Korean NPPs have invested a large budget to mitigate the consequences of a natural disaster beyond design basis events. It seemed that Korean NPPs had been well prepared until the station blackout at Kori 1. However public acceptance of NPPs has dropped significantly after Kori 1 hid the station blackout for 12 minutes in 2012. Both events have changed the Korean social environment. Residents and Korean scientists consider NPPs as a potential hazard to society. They try to eliminate latent defects in NPPs.
Safety culture can be observed in behaviors or appear in artifacts such as workflow, procedure, and policy. According to Schein, safety culture has three level layers . The top layer is artifacts. When the artifacts are adhered as written, the level of safety culture can be increased.
Figure 5 shows the hierarchy from the safety culture layer to a real plant. Nuclear plants are supported by many processes. Safety culture resides below the process layer. Therefore, safety culture is not directly linked to the real plant. Both the human layer and process layer are heavily influenced by the safety layer. The NRC regards safety culture roughly as human performance, safety conscious work environment, and problem identification and resolution in the reactor oversight process. These elements are core elements for all processes, and called cross-cutting area in the reactor oversight process.
The safety culture layer in Korean nuclear power plants is further structured within the safety culture model. The
model consists of the safety culture definition, principles, and action statements. KHNP safety culture is defined as safety culture is organizational member belief and behavior to place safety in the first priority Safety culture principles are listed as follows. They are similar to those of other organizations because safety culture is a universal concept [35~38].
？ Everybody is responsible for safety
？ Leaders demonstrate a commitment to safety
？ Everybody has a questioning attitude
？ The working environment is safety conscious
？ Continuously learning and improving organization is embraced.
？ Processes to foster safety are implemented
Traditionally, Korean society had been governed by strong Confucian cultures. The culture had refrained Koreans from expressing their thoughts to their seniors. After the collapse of the Chosun dynasty, Korea adopted capitalism and is a society full of dynamics. Lots of students have been trained abroad, but failed to learn the western culture
Chapter 3 proposes specific technical approaches to implement the issues mentioned in Chapter 2. The proposed approaches are selected on the basis of the statistics from OPIS . Table 3 shows the prioritized causes of human errors in Korean NPPs between 1978 and 2008.
From Table 3, while the 2nd and 3rd groups belong to design fault, the others are associated with operational
fault and these exceed 2/3 of the entire causes. Therefore, three facts can be drawn to reduce the majority of human errors during the operational phase: (1) crews should have enough time to execute procedures, otherwise tasks need to be automated, (2) all information for interactive tasks should be shared without loss, and (3) the supervisory steps to check the initial and end states of tasks should be reinforced. Fortunately, the above facts can be practically implemented as long as advanced IT is merged with the current MMIS. Upon these observations, this chapter addresses the overview, methodologies, and expected benefit of advanced automation, operator support systems, and test support systems as the methods of materializing lessons-learned.
Automation was described in a broad sense in Section 2.1. Now, two approaches of automation to be applied to commercial NPPs will be described in a narrow sense directly related to operation control. The first approach is to increase the level of automation in a full range of NPP operations to significantly reduce the human error. The second one is to improve the operation performance by applying advanced control algorithms to a variety of control systems of NPPs.
1) Increasing the level of automation in a full range of operations
Since NPPs take charge of the base load in supplying electricity, NPPs are operated at high power most of the time. Therefore, they are completely automated at high power. Table 4 shows the level of automation and the number of plant shutdowns due to human errors in each operation mode. The heat-up/cool-down operation and low power operation are very short in time but their ratios at the number of plant shutdowns are significant. Therefore, it is required to increase the level of automation in the operation modes, including heat-up/cool-down and low power operations to reduce human errors in these operation modes. Figure 6 shows typical operation mode and automation level in PWR (Pressurized Water Reactor) plants. Human errors must be significantly reduced by mitigating operators’ workload and minimizing operator intervention through automated operation in the operation modes.
Even if it is difficult to design completely automated control systems for a full range of operations including heat-up, low power, high power, and cool-down operations, it is necessary to reduce human errors by greatly increasing the current levels of automation. Actually, it is not difficult to automate the startup operation if no disturbances occur. However, a simple procedure cannot respond to the transients or abnormalities that may occur during startup. Looking into research concerning the NPP startup operation, Berkan et al.  have proposed an automated startup control system for the Experimental Breeder Reactor-II (EBR-II). Due to the presence of plant nonlinearities over the startup period, they used intelligent techniques such as reconstructive inverse dynamics, fuzzy logic, and neural networks to develop the controllers. In Korea, Kim et al.  have presented a preliminary study for the heat-up operation. At present, Ohi 3 and 4 units in Japan perform automated operation during plant heat-up and cool-down. Since applying the automation to heat-up and cool-down operations in Ohi 3 and 4 units, it is reported that manipulation
frequencies and monitoring frequencies were reduced by 60% and 75% as compared with conventional manual operation . Operators’ burden will be reduced as the operating frequencies are reduced, which can reduce human errors. The reduction of human errors can contribute to increasing public acceptance, preventing the inadvertent plant shutdown.
2) Advanced control algorithms
Most control systems of current commercial NPPs use conventional PID controllers and lead/lag compensators. To improve the plant performance, it is required to design control systems equipped with advanced control algorithms and also suggested to replace the conventional control systems with the advent of the computing platform of highly efficient distributed control systems. Therefore, it is necessary to actively apply and implement advanced control algorithms being used widely in general industries.
For this reason, more advanced control methods and algorithms have been studied by many researchers [42-49] in nuclear fields and realistically applied to a great degree in other industries. Further advances in control methods will be made and NPPs are expected to take advantage of them in order to help meet their overall operational performance and safety objectives. The advanced control methods include optimal control, robust control, model predictive control, fuzzy logic control, neural network-based control, adaptive control, and so forth. Multiple combined control methods will also be applied to assist autonomous operation of NPPs to achieve optimum performance.
The followings are representative control topics to which these advanced control algorithms can be applied and whose control performance can be improved:
？ Water level control of steam generators
？ Reactor power control
？ Load-following operation
It is desired to establish more automated operations for a full range of operation by using the advanced controllers for detailed control problems.
As mentioned in Section 2.2, successful accident management can be expected through the collaboration of human operators and computer systems. The functions which must be performed can be categorized into four tasks: information acquisition, situation identification, adoption of control decisions, and implementation of decisions. The effective support of these tasks by using advanced MMI would lead to better performance of accident management. In this sense, in addition to wellorganized display of information, the operator support systems explained in this section are expected to be deployed in the near future to improve the safety of NPPs.
1) Plant status information system
Plant status must be accessed by the human operators both in normal operation and an emergency situation. The effectiveness of plant status recognition is one of the keys to enhance the mitigation success probability. If the controls of all safety components are not automated, the operators will be under serious time pressure to cope with a sudden accident. It is well known that the change of available time considerably affects the human error probability and results in a big difference of actuation success probability of safety components .
The plant status deviation from its normal condition can be detected by a computer system even before the process parameters reach alarm set points. This will give an earlier warning for the operators and more time to determine proper counter measures and implement safety actions. The primary function of a plant status monitoring system is to provide this kind of early alarm of plant status deviation. Its secondary function would be to provide a cue to determine the specific status of a plant. This will enhance the human operator’s capability.
This secondary function could be interpreted to the diagnostic function. The system will support human operator to diagnose the root cause of plant deviation or will provide a list of suspected causes by using automated diagnostics algorithm. There is a big difference between these two functions. It is notable that the system may mislead the operators into a wrong conclusion and cause the loss of very important time to cope with deviation. The level of automation in diagnostics must be decided in consideration of the possibility or probability that the automated system fails to trace correct root causes.
On the other hand, this plant status information can be utilized by plant maintenance personnel for checking the plant status and for ensuring the planned maintenance action will not cause a serious disturbance of plant operation.
2) Response prediction system
When accident situation occurs, the plant response to possible operator actions needs to be identified before actions are actually implemented. The operators also need to know the result of an omission of required actions. Even very knowledgeable and skillful plant operators might feel difficulty in performing this kind of prediction since they would be under a stressful condition caused by time pressure. The plant response prediction system will help the operators to reduce the prediction task burden and allow for checking the suitability of planned actions in a more flexible manner.
The direct use of a very fast simulator would be one of the possible approaches. As mentioned in Chapter 2, the high speed requirement may induce the use of responsesurface models rather than physics-based models of which computer simulations do not give enough speed at this time. Real physical-law-based simulations such as computational fluid dynamics models are far slower than real time.
The relationships among safety actions, some critical process parameters, and plant response variables will be the basis of response-surface models. Statistical models or correlations among variables would be used for this purpose. However, it should be noted that this model is only an approximation. Thus if we do not have enough knowledge to develop it to the extent of possible use during accidents, the response prediction system will mislead the operators to an incorrect conclusion. The coverage of a response prediction system would be identified before we deploy it.
3) Recovery support system
One of the significant merits of human-computer collaboration is the enhanced possibility of recovery of faulty actions. The human operators tend to make various kinds of mistakes. The computer system could check the adequacy of a human action before it is transmitted to field controllers.
Primitive recovery support can be achieved by just asking to the operator what he/she actually intended to do. If the operator reads this verification message, there will be a greater chance of recovery from faulty manipulation. Some advanced features can be added to this. A rule-based checking algorithm is one of the feasible techniques to validate an operator’s actions. For a given plant status or an emergency operation procedure, expected operator actions can be treated in the form of a rule-based model. The relationships or sequence of actions can also be covered.
The combined use with a plant response prediction system and a computerized procedure system will be a more sophisticated way of operator action validation. The most advanced recovery support system would provide the recognition of fault actions and the possible recovery actions which are available at that moment in consideration of plant status.
4) Maintenance information system
Maintenance of safety-critical equipment may cause unavailability of safety functions. Thus this would be one important piece of information that the plant operators need to know in addition to the process parameters. A maintenance information system covers two different factors: the recognition of safety functions’ availability and the reduction of their unavailable time.
The first factor, the recognition of safety functions’ availability, can be addressed by gathering component status information. Effective display of this information will reduce the chance of human errors which might be caused by control room operators or field maintenance personnel. The provision of a well-established maintenance procedure would be one possible way to address the second factor, the reduction of safety functions’ unavailable time. After maintenance, we also need to ensure that the safety equipment is ready to be operated and the safety function is available.
Risk information also needs to be incorporated into this system to enhance plant safety and economy. This maintenance information system facilitates more effective planning of maintenance and minimizes possible human errors.
Among maintenance tasks performed in NPPs, the periodic tests should be a key factor for keeping nuclear safety. Under the Korean nuclear regulatory framework, the technical specifications in the final safety analysis report explicitly specify the list of tests and inspections to
satisfy the limits for operating conditions. The particular procedures for such tests and inspections are provided by the periodic test procedures. Since the periodic tests cover the examination and investigation for safety-critical structure, system, and components, a single mistake in performing the tests can threaten the conditions of a power station.
It is needless to note that there must be enough capable engineers to promote the reliability of the periodic tests. Aside from the aspect of human resources, we are able to recommend a few ideas from previous experiences: (1) All the information during the periodic tests should be correctly shared among crews, (2) Each step should be proactively verified, properly performed, and validated in on-line manner, and (3) All administrative processes including the decision on pass or not of tests, history traceability, and automatic reporting to interagency should be supported to reduce excessive burden. Figure 7 illustrates how technologies can support these ideas in an NPP.
Technical components required are as follows:
1) Procedure platform for periodic tests
The CBPs for operation have been set up for APR- 1400 so their platform was also completed. However, test procedures are less standardized and formatted than operation procedures. A platform to develop, execute, and maintain test procedures should be developed as a preliminary step.
2) Computerized periodic test procedures
The terminal for the periodic test procedures should provide wired and wireless communication which can be allowed in nuclear application. Obviously, EMI (Electromagnetic Interference) / RFI (Radio Frequency Interference) should be validated not to make any unexpected malfunction on other instrumentation and controls. Any possible environment in a testing area such as radioactive exposure, temperature, humidity, and any impact during usage to successfully achieve all tasks should be taken into account.
Aside from these hardware related requirements, the functionality for software should be specified such as the function to verify testing tasks, identify initial and final conditions, and check supplementary operational information. A remote coaching function is necessary to discuss unresolved issues with advisory experts outside through wired and/or wireless vision and/or voice connection, which is useful to prevent tester’s mistakes and protect the testing crew’s safety.
3) Communication network infrastructure
Due to the nature of the periodic tests, the computerized procedure should be mobile, so appropriate communication infrastructure should be supported. There have been numerous examples of using wired/wireless communication technology for voice messaging, data transfer, radiation monitoring, security surveillance, and process or equipment condition monitoring in NPPs . Nevertheless, communication networks within a nuclear island should be cautious due to potential problems caused by EMI/RFI. Furthermore, since cyber security is present and not fully guaranteed by the regulatory body, computerized periodic test procedures should be developed using wired or short-range wireless infrastructure, as long as it is a prototype, while wireless applications remain open.
4) Services for administrative processes
The work orders of the periodic tests in Korean NPPs are issued individually by the ERP (Enterprise Resource Planning) operated by the utility. However the configuration and scheduling of the entire tests is not effectively performed. For instance, any of two tests which should be performed exclusively are not under control. Traceability for test records is not fully secured. The computerized periodic test procedure needs to provide all necessary administrative processes involved in the entire work order.
Services for administrative processes can be summarized as follows:
？ Scheduling and management of entire periodic tests
？ Indication of periodic tests currently performed
？ Warning and prevention for exclusive tests (a test should not be performed with the other test at the same time)
？ Prioritization for periodic tests
？ Management of track records (test logs and all their revisions)
It is well known that events in NPPs are mainly caused by human errors. In order to make NPPs safe and reliable, therefore, we have to do our best to reduce human errors. Many things can be done to reduce human errors in NPPs, but we suggested five possible ways in this work. These are to (1) automate NPP operations and maintenance as much as possible, (2) improve MMIs for both operation and maintenance to be as much fault free and human friendly as possible, (3) improve operation and maintenance procedures (4) thoroughly train and educate operators and maintenance people, and (5) improve safety culture. We first reviewed each of these five methods. After that, we chose only the ways in which we can achieve the human error reducing goals by applying existing ITs or by using advanced MMIS. The answer is the automation of NPPs and improving MMIs and procedures. We first tried to find how to reduce human errors through the automation of NPPs. By increasing the level of automation in a full range of operations, we showed how much human errors can be reduced based on existing data. We also commented on various advanced control algorithms which were already developed and demonstrated by many researchers. Second, we tried to find how we can reduce human errors by developing operator support systems. We can develop systems which support the human operations of diagnosis, decision support, and execution validation. This integrated operation support system can be a stand-alone system or incorporated into the computerized operation procedure systems as advanced functions. Finally, the test support systems can be developed to reduce human errors. Maintenance people make many mistakes while doing periodic testing work. It can be due to improper working procedures or incomplete communication between field test people and people inside the control room. It is also from routine services such as wrong loggings and so on. In order to reduce human errors during the test, developing test support systems is strongly recommended in this work. In addition to these efforts, other things should be done simultaneously. One is the training and education of operators and maintenance people, not only in technical skills but also non-technical skills such as communication, leadership, and cooperation. There is also the matter of improving safety culture, which is also very important to reduce human errors. Not keeping the working procedure faithfully, hiding some important issues related to NPP safety, and putting the economic importance before the safety importance are all well-known examples of improper safety culture of an organization. The Chernobyl accident was a good example of improper safety culture. In this work, we did not look into these training and safety culture problems in detail. We only considered the ways to improve NPP safety by reducing human errors utilizing existing information technologies. We hope we can reduce human errors substantially by developing the methods we suggested in this work.
[Table 1.] Defense-in-Depth Concept for Reducing Human Errors
[Fig. 1.] Personnel Performance versus Workload
[Fig. 2.] Automation Dimension Used by US NRC (Nuclear Regulatory Commission)
[Table 2.] Allocation Roles for Endsley's Levels of Automation
[Fig. 3.] Generic Activities for Achieving Functions and Tasks
[Fig. 4.] Comparison of Limitation (Left) and Potential (Right) of Education and Training
[Fig. 5.] Relationship between Safety Culture and Plant
[Table 3.] Analysis of Human Errors in NPPs between 1978~2008
[Table 4.] Automation Level and the Number of Plant Shutdown due to Human Errors in Each Operation Mode (Inadvertent Shutdown Cases from 2002 to 2011 of OPIS)
[Fig. 6.] Typical Operation Mode and Automation Level in PWR
[Fig. 7.] Illustration of a Periodic Test Support System