The Safety Instrumented Systems are used to monitor the condition of values and parameters of a plant within the operational limits and, when risk conditions occur, they must trigger alarms and place the plant in a safe condition or even at the shutdown condition.
The safety conditions should be always followed and adopted by plants and the best operating and installation practices are a duty of employers and employees. It is important to remember that the first concept regarding the safety law is to ensure that all systems are installed and operated in a safe way and the second one is that instruments and alarms involved with safety are operated with reliability and efficiency.
The Safety Instrumented Systems (SIS) are the systems responsible for the operating safety and ensuring the emergency stop within the limits considered as safe, whenever the operation exceeds such limits. The main objective is to avoid accidents inside and outside plants, such as fires, explosions, equipament damages, protection of production and property and, more than that, avoiding life risk or personal health damages and catastrophic impacts to community. It should be clear that no system is completely immune to failures and, even in case of failure; it should provide a safe condition.
For several years, the safety systems were designed according to the German standards (DIN V VDE 0801 and DIN V 19250), which were well accepted for years by the global safety community and which caused the efforts to create a global standard, IEC 61508, which now works as a basis for all operational safety regarding electric, electronic systems and programmable devices for any kind of industry. Such standard covers all safety systems with electronic nature.
Products certified according to IEC 61508 should basically cover 3 types of failures:
IEC 61508 is divided in 7 parts, where the first 4 are mandatory and the other 3 act as guidelines:Part 1: General requirements
Such standard systematically covers all activities of a SIS (Safety Instrumented System) life cycle and is focused on the performance required from a system, that is, once the desired SIL level (safety integrity level) is reached, the redundancy level and the test interval are at the discretion of who specified the system.
IEC61508 aims at potentializing the improvements of PES (Programmable Electronic Safety, where the PLCs, microprocessed systems, distributed control systems, sensors, and intelligent actuators, etc. are included) so as to standardize the concepts involved.
Recently, several standards on the SIS development, project and maintenance were prepared, as IEC 61508 (overall industries) already mentioned, and is also important to mention IEC 61511, focused on industries with ongoing, liquid, and gas process.
In practice, in several applications it has been seen the specification of equipment with SIL certification to be used in control systems, without safety function. It is also believed that there is a disinformation market, leading to the purchase of more expensive pieces of equipment developed for safety functions where, in practice, they will be used in process control functions, where the SIL certification does not bring the expected benefits, making difficult, inclusive, the use and operation of equipment.
In addition, such disinformation makes users to believe that they have a certified safe control system, but what they have is a controller with certified safety functions.
With the increase of usage and applications with digital equipment and instruments, it is extremely important that professionals involved on projects or daily instrumentation are qualified and have the knowledge on how to determine the performance required by the safety systems, who have domain on calculations tools and risk rates within the acceptable limits.
In addition, it is necessary to:
The simple use of modern, sophisticated or even certified equipment does not absolutely ensure any improvement on reliability and safety of operation, when compared with traditional technologies, except when the system is deployed with criteria and knowledge of advantages and limitations inherent to each type of technology available. In addition, the entire SIS life cycle should be in mind.
Commonly we see accidents related to safety devices bypassed by operation or during maintenance. Certainly it is very difficult to avoid, in the project stage, that one of such devices are bypassed in the future, but by a solid project that better satisfies the operational needs of the safety system user, it is possible to considerably eliminate or reduce the number of non-authorized bypass.
By using and applying techniques with determined or programmable logic circuits, failure-tolerant and/or safe failure, microcomputers and software concepts, today is possible to project efficient and safe systems with costs suitable for such function.
The SIS complexity level depends a lot on the process considered. Heaters, reactors, cracking columns, boilers, and stoves are typical examples of equipment requiring safety interlock system carefully designed and implemented.
The appropriate operation of a SIS requires better performance and diagnosis conditions compared to the conventional systems. The safe operation in a SIS is composed by sensors, logic programmers, processors and final elements designed with the purpose of causing a stop whenever safe limits are exceeded (for example, process variables such as pressure and temperature over the very high alarm limits) or event preventing the operation under unfavorable conditions to the safe operation conditions.
Typical examples of safety systems:
We have seen in the previous article, in the first part, some details on the Safety Life Cycle and Risk Analysis. Now we will see, in the second part, a little aboutReliability Engineering
The reliability of the measurement systems may be quantified as the mean time between the failures occurring in the system. In this context, a failure means the occurrence of an unexpected condition that causes an incorrect value in the output.
The reliability of a measurement system is defined as the ability of a system executing its function within the operating limits and conditions during a defined time period. Unfortunately, several factors, such as manufacturer’s tolerance according to operating conditions sometimes make difficult such determination and, in practice, what we can get is statistically expressing the reliability by failure probabilities occurring within a time period.
In practice, we face a great difficulty, which is determining what is a failure. When the output of a system is incorrect, this is something hard to be interpreted compared with the overall loss of the measurement output.
As seen before, reliability essentially has a probability nature and can be quantified in almost absolute terms by mean time between failures (MTBF) and mean time to failure (MTTF). Importantly, those two times are usually the mean values calculated using an identical number of instruments and, therefore, for any particular instrument its values may be different from the average.
The MTBF is a parameter expressing the mean time between failures occurring in an instrument, calculated in a specific period of time. In cases where the equipment has high reliability, in practice, it will difficult to count the number o failures occurrences and non-precise number may be generated for the MTBF and, then, using the manufacturer value is recommended.
The MTTF is an alternative mode to quantify reliability. It is normally used for devices such as thermocouples, as they are discharged when they fail. MTTF expresses the mean time before the failure occurs, calculated in an identical number of devices.
The final associated reliability in terms of importance to the measurement system is expressed by the mean time to repair, that is, the mean time to repair an instrument or even the mean time to replace an equipment.
The combination of MTBF and MTTR shows the availability:
Availability = MTBF/ (MTBF+MTTR)
The availability measures the proportion of time in which the instrument works without failures.
The objective with measurement systems is to maximize the MTBF and minimize the MTTR and, consequently, maximize the Availability.
The failure mode in a device may change throughout its life cycle. It may remain unchanged, decrease or, at least, increase.
In electronic devices, it is common to have a behavior according to figure 1, also know as the bathtub curve.
Figure 1 - Typical Curve of reliability variation of an electronic component
Manufacturers usually apply burn-in tests in a way to eliminate the phase until T1, until products are placed in the market.
But the mechanical components will have a higher failure rate in the end of their life cycle, as per figure 2.
Figure 2 - Typical Curve of reliability variation of a mechanical component
In practice, where systems are electronic and mechanical compositions, the failure models are complex. The more components, the higher the incidents and probabilities of failures.
In the practice, usually we will have several components, and the measurement system is complex. We may have components in series or in parallel.
Reliability of components in series should take into consideration the probability of individual failures in a time period. For a measurement system with n components in series, reliability Rs is the product of individual reliabilities: Rs = R1xR2...Rn.
Imagine we have a measurement system composed by a sensor, a conversion element and a circuit of signal processing, where we have the following reliability: 0.9, 0.95 and 0.099, respectively. In such case, the system reliability will be:
0.9x0.95x0.009 = 0.85.
The reliability can be increased by placing components in parallel, what means that the system fails if all components fail. In such case, reliability Rs is demonstrated by:
Rs = 1 – Fs, where Fs is the non reliability of the system.
The non-reliability is Fs = F1xF2...F3.
For example, in a safe measurement system, there are three identical instruments in parallel. The reliability of each one is 0.95 and that of the system is:
Rs = 1 – [ (1-0.95)x(1-0.95)x(1-0.95)] = 0.999875
What we look for, in the practice, is to minimize the level of failures. An important requirement is to ensure one knows and act before T2 (see figures 1 and 2), when the statistical frequency of failures increases. The ideal is to make T (time period or life cycle) is equal to T2 and, then, maximizing the period without failures.
There are several ways to increase the reliability of a measurement system:
The Safety Systems are used to monitor the condition of values and parameters of a plant within the operational limits and, when risk conditions occur, they must trigger alarms and place the plant in a safe condition or even at the shutdown condition.
Note that the safety conditions should be followed and adopted by plants where the best operating and installation practices are a duty of employers and employees. It is important to remember that the first concept regarding the safety law is to ensure that all systems are installed and operated in a safe way and the second one is that instruments and alarms involved with safety are operated with reliability and efficiency.
The Safety Instrumented Systems (SIS) are the systems responsible for the operating safety and ensuring the emergency stop within the limits considered as safe, whenever the operation exceeds such limits. The main objective is to avoid accidents inside and outside plants, such as fires, explosions, equipment damages, protection of production and property and, more than that, avoiding life risk or personal health damages and catastrophic impacts to community. It should be clear that no system is completely immune to failures and, even in case of failure; it should provide a safe condition.
Reliability is a metric developed to determine the probability of success of an operation in a specified period of time.
When (failure rate) is too low, the non-reliability function (F(t)) or the Probability of Failure (PF) is shown by: PF(t) = t
Figure 3 - Reliability R(t)
The reliability measurement requires that a system has success in an operation during a time interval. In this sense, appears the MTTR metrics, which is the time in which is detected a failure and its repair occurs (or the reestablishment of operating success).
The reestablishment of the operating success is shown by: µ = 1/MTTR
In practice, it is not simple to estimate that rate, mainly when periodic inspection activities occur, as the failure may occur just after an inspection.
The MTBF is a basic measure of the reliability in repairable items of a piece of equipment. It may be expressed in hours or years. It is commonly used in systems reliability and sustainability analysis and can be calculated by the following formula:
MTBF = MTTR + MTTF
Where:
As the MTTR is too low in practice, it is common to assume the MTBF = MTTF
Another very useful metric is the availability. It is defined as the probability of a device being available (without failures) when a time t requires it to operate within the operating conditions to which it was designed.
Unavailability is given by: U(t) = 1 – A(t)
Availability is not only a reliability function, but it is also a maintenance function. Table 1 below shows the relationship between reliability, maintenance and availability. Note in this table that an increase in the maintenance ability implies on a decrease in the time necessary to conduct the maintenance actions.
Table 1 - Relationship between Reliability, Maintenance and Availability
Figure 4 - Reliability, Availability and Costs
PFDavg is the probability of failure that a system (for failure prevention) has when a failure occurs. The SIL level is related to this probability of failure on demand and with the factor of risk reduction (how much needs to be protected to ensure an acceptable risk when a failure event occurs).
PFD is the Indicator of reliability appropriate for the safety systems.
If it is not tested, the failure probability tends to 1.0 with the time. Periodic tests maintain the probability of failure within the desired limit.
Figure 5 - Voting, PFD and Architecture
Figure 5 shows the architecture details versus the voting and PFD and figure 6 shows the correlation in PFD and Factor of Risk Reduction. Subsequently, we will discuss it in more details in the articles complementing this series.
Figure 6 - Correlation between PFDavg and Factor of Risk Reduction
The Failure Probability may be calculated using the following equation:
PFAvg = (Cpt x x TI/2) + ((1-Cpt) x x L xT/2), where:
Let’s see and example: Let’s suppose that a valve is used in a safety instrumented system and has an annual failure rate of 0.002. Every year a verification and inspection test is conducted. It is estimated that 70% of failures are detected in such tests. Such valve will be used for 25 years and its usage demand is estimated as once every 100 years. What is the average probability of failure?
Using the previous equation we have:
PFDavg = (0.7) x 0.002 x ½ + (1-0.7) x 0.002 x 25/2 = 0.0082
In practical terms, the aim is the reduction of failures and, consequently, the reduction of shutdowns and operational risks. The purpose is to increase the operational availability and also, in terms of processes, the minimization of variability with direct consequence to the profitability increase.