What is Reliability? Phase, Advantage, Techniques

  • Post last modified:17 August 2023
  • Reading time:23 mins read
  • Post category:Uncategorized
Udacity Offer 50 OFF

What is Reliability?

Reliability refers to the consistency and dependability of a system, process, or entity to perform its intended function or deliver its expected outcomes under specific conditions. In other words, it’s the extent to which something can be trusted to work correctly and consistently over time.

An important dimension of quality is reliability, it means if a product can perform under given conditions for a long time. It is also an important aspect of measuring the consistency in performance of a product or service over a given period of time. Sudden breakdown or failure in a component or a system of a machine may cause disruption in the work process leading to reduction in productivity.

Thus, reliability problems may result in severe disruption of services that may in turn cause heavy losses to companies. Thus, reliability can be defined as the probability of a product to perform its anticipated function. The two most notable definitions of reliability that can be applied under all conditions:

Reliability can be defined as probability of a system, equipment, or component to function as intended over a given period under specified conditions.

Reliability can be defined as probability of a system, equipment, or component to function as intended on demand.

Thus, it can be said that the acceptance of a product or process is dependent on the meeting of certain set of requirements for reliability of products or process. Reliability is built in a product or equipment by designing and implementing a well-controlled reliability programme. It mainly entails a reliability plan that consists of all the activities to be carried out during the different phases of product development from design to production to the maintenance phase.

Phases of Reliability Programme

Design and Development Phase

Reliability can be built in both component level design and system level design. In component level design, reliability is created through comprehending and assessing a failure mechanism and developing appropriate design provision, such as enhancing safety factor. On the other hand, various methods can be performed to build reliability in system level design, such as functional independence, employing reliable components and fault tolerance.

By functional independence, it is implied that a system must be designed in a way to perform its projected function, regardless of the operating state of other systems or equipment. Similarly, by selecting appropriate components based on operating environment of a system can also help in building desirable reliability. Another way of making a system reliable is to build it with fault tolerance capability. In this method, the health of a system is checked regularly to detect faults. In case a fault is detected, the faults are isolated by the systems while continuing to perform its intended functions.

Production Phase

In the production phase, reliability can be built by selecting adequate machines and tools and adhering to production process strictly. Moreover, processes must be standardised, conditions of machines and tools must be controlled and employees operating them must be trained. At every production phase, product needs to be tested through inspection, unit level test and integration test. If failures are detected, corrective action must be taken immediately.

Operation and Maintenance Phase

The reliability of operation of an equipment/system is based on adequate use and maintenance. Therefore, a well-defined operation and maintenance (O&M) programme must be employed, wherein O&M personnel are trained. Moreover, an appropriate maintenance strategy can be selected.

Reliability IndexExplanation
Mean Time between Failures (MTBF)It is used to calculate average time between successive failures of product.
Failure rateIt represents number of failures per unit time
Mean Time to FailureIt represents the failure period of a non-repairable product.
Mean LifeIt represents average value of life.
LongevityIt represents a product’s wear-out time
System effectivenessIt is the degree to which a product conforms to the consumer requirements
b10 lifeIt represents the life of a product, wherein its 10% parts have failed.
b50 lifeIt represents the life of a product, wherein its 50% parts have failed.
Repairs/100It can be calculated as the number of repairs performed per 100 operating hours
Commonly Used Reliability Indices

Advantages of Reliability Programme

Reliability programmes offer various benefits to manufacturers by assuring the product’s performance as required under specific conditions and demanded by customers. Some of the important benefits of incorporating reliability programme are given as follows:

  • Reduction of inventory costs through accurate prediction of spare parts.

  • Production of goods that work under environmental conditions imposed by customers, thereby, increasing their satisfaction.

  • Anticipation of failures so that preventive measures can be implemented on time.

  • Adoption of right maintenance strategies, which helps to greatly reduce operating expenses.

  • Reduction of failures and optimised maintenance in reliability programme lead to fewer spare parts being required in logistics systems, thereby, reducing distribution costs for transportation, logistics and storage for spare parts.

  • Operation of reliable products as expected without failure and therefore, fewer are returned for repairs. This in turn augments the reputation of the product and manufacturer in the eyes of users.

  • Reliability programmes also ensure that safety provisions have been built in products while manufacturing them, making them safe for use by customers.

Types of Failures

Various types of failures can occur at any stage of product’s life cycle that may need to be taken care of by a reliability programme while designing, manufacturing and operating products.

These failures are discussed as follows:

Early Failures

These are the type of failures that manifest early in the life of a component. For example, failures of component due to poor welds, solder joints and connections, contamination on surfaces or materials, cracks and thin spots in insulation, incorrect positioning of parts. These early failures often show up while conducting process or final tests, process audits and environmental tests. These failures can be prevented by debugging process, wherein equipment is operated in simulated conditions for a definite period. Throughout the process, sub-standard components can be detected and replaced by good components for actual use.

Wear-out Failures

These failures occur due to wear-out of parts and components. Wear-out failures occur as a result of deterioration of design strength caused due to exposure of environment. The deterioration may be caused due to various chemical and physical phenomena, such as corrosion, insulation breakage, frictional wear, shrinkage or crack in plastic. Most of the times, wearout failures can be prevented by two methods:

  • Firstly, by replacing accessible parts, which are prone to wear out at regular intervals and make these intervals short.

  • Secondly, by employing one-shot equipment, which are built for one time use.

Chance Failures

These types of failures are caused by sudden stress accumulations which may not be covered in design strength of a component. Chance failures are encountered randomly and unexpectedly as their prediction is difficult. These failures are also termed as catastrophic failures; however, both early failures and wear-out failures can also have disastrous effects. It is not possible to eliminate chance failures, though by applying reliability techniques, the occurrence of such failures can be greatly reduced.

Quality and Reliability

The relationship between quality and reliability is quite old. A product cannot be reliable if it has no quality. Thus, for a product to be highly reliable, it must be high on quality as well.

Quality is defined as the degree of conformance of specific standards and is not concerned about time and environment. It is mainly associated with the manufacturing of the product. However, reliability deals with the design of a product and can be defined as the unit’s ability to maintain its quality under stated conditions for a given period of time.

The main aspects of reliability are intended function of a product and the environment under which the product operates. The “intended function” of the product or service is associated with the quality. For instance, the intended function of an electric generator can be to supply power of 5 kW at 220 V under precise conditions.

Any deviation in supply of power may stop the generator and it could result in loss of productivity and quality in a factory or an organisation. Similarly, environmental conditions such as temperature, humidity, vibrations etc., also have a great impact on the quality of the product or service and may cause the failure of the equipment or service under operation. Therefore, operating conditions are also an essential requirement for building quality and reliability of the products or services.

Creating a ‘good’ quality system from ‘poor’ quality components is impossible as the quality of the product cannot be improved once the product has been manufactured. For this, the modification would be required in the manufacturing of the product by performing the production cycle again.

Techniques for Improving Design Reliability

While designing reliability in products, the main emphasis must be laid on prevention of failures at later stages of manufacturing. There are various techniques that can be adopted by a designer to improve design reliability.

Some of the general techniques that are applied by designers are as follows:

  • Applying redundancy of critical parts in the design to provide more means to complete a task in a system.

  • Reviewing and checking each and every component before using it in a system.

  • Controlling environmental conditions to reduce failure rates by performing activities like providing adequate heat, or cooling systems.

  • Generating schedules for preventive maintenance, thereby, removing and replacing unreliable parts before they wear out.

  • Conducting research and development on an on-going basis to achieve required improvements in design and performance of a product and enhance its quality.

Apart from that, two most important techniques for improving design reliability are fault tree analysis (FTA) and failure mode effect criticality analysis (FMECA). Let us discuss these two in subsequent sections.

Fault Tree Analysis

Fault tree analysis is a reliability technique that performs deductive failure analysis using Boolean logic to combine sequences of low-level events. It was developed by Bell Telephone Laboratories for the U.S. Air Force in the year 1962. Later the technique was adopted and further improved by Boeing Company.

In fault tree analysis, a logic diagram (also known as fault tree) is created to display the relationship between a probable critical failure in a system and the main cause of the failure. The causes can be human errors, environmental conditions or component failures. A fault tree analysis can be either qualitative or quantitative based on the objective of undertaking the analysis.

To conduct a fault tree analysis, there are certain steps that can be followed.

  • Define the problem: In this step, two aspects need to be defined – the critical event and boundary conditions. The critical event is analysed in fault tree and is also known as top event.

    Further, to get a clear analysis, boundary conditions should be ascertained. The boundary conditions include physical boundaries, initial boundaries, external stresses and resolution level.
    • Physical boundaries describe those parts that should or should not be included in analysis.

    • Initial conditions define the operational state of the system at the occurrence of top event.

    • External stresses include stresses due to war, sabotage, lightning etc.

    • Resolution level defines the degree of detailed analysis required to be performed.

  • Develop an understanding of the system: Once the problem is defined in detail, the next step is to study all causes with probabilities of affecting the top event. For this, one can take the assistance of system analysts who can provide a clear understanding of the functioning of the system. After that, all the causes are arranged in a sequence for constructing fault tree in the next step.

  • Construct fault tree: In this step, fault tree is constructed in the beginning with the top event. All the causes are then connected to the top event through logic gates at first level, also known as top structure. After that, the complete fault tree is created level by level by stating all its major characteristics.

  • Analyse the fault tree: In this step, the fault tree is analysed to find out ways of improvement. All probable hazards and risks are identified. To get the clear picture of hazards, both quantitative as well as qualitative evaluation must be carried out.

  • Control the risks identified: This is the last step of fault tree analysis, wherein all the possible steps must be devised and put into action to control the hazards and risks identified in the previous step.

Application of Fault Tree Analysis by Honda in Deployment of Airbags

The Honda’s development team was reorganised under a new aim at the end of 1982. Their goal was to improve the reliability of the airbag system so that it could be manufactured as an SRS (supplemental restraint system), which meant that it had to supplement the role of seatbelts. Thus, Honda’s development team was reduced to four members from an initial 10, with Saburo Kobayashi acting as the third project manager.

“We had acquired the basic technologies and established the necessary functions,” recalled Kiyoshi Kawashima, the CEO of Honda Motor Company. “We simply needed to ensure productivity and reliability. This time we were determined to make it into a product.”

The creation of a reliable system proved more difficult than developing the mechanism itself. It was due to the fact that the airbag system was a one-shot device. In other words, it was not possible to run tests on each system to ensure that it would work correctly.

Therefore, the Honda team sought NASA’s (the U.S. National Aeronautics and Space Administration) help to study the techniques used in their own development programmes, which is renowned for its reliability. In September 1983, the team visited the McDonnell Douglas Aerospace Center (MDAC), where they studied various techniques and ideas having direct relationships to reliability.

The fault tree analysis (FTA) method was one such method that they found suitable for their needs. The crux of FTA is to define a purpose and identify aspects that prevent the achievement of that goal. After that, the probability of failure is calculated for each issue, along with the total probability of failure.

For example, the Honda team would start by determining the role of the airbag, which is to protect the passenger by inflating properly in the case of car crash. Subsequently, they would recognise the issue that prevents the airbag from executing its purpose, like improper inflation.

Then, the factors causing the phenomenon are identified. These may involve a defective sensor system or the malfunctioning of the module containing the bag. Finally, the problem is traced down to a single part. Next, the full analytical process is repeated, and the failure probability of each phenomenon is calculated. Based on the results, the team would develop preventive measures for such phenomena showing lower reliability, until the total of failure probabilities reaches the target.

An important point in conducting an FTA is to begin with a common theory before going into the airbag’s specific failures. In other words, imagine what can happen to the driver when a car is involved in an accident. A general concept such as this highlights the significance of thinking of the other party involved in the accident. Moreover, airbag failure can occur due to several different reasons. Thus, defining the failure conditions and classifying them into relevant groups can result in suitable solutions.

Failure Mode Effect Criticality Analysis

Another important technique that helps in analysing failure is failure mode effect criticality analysis (FMECA). It is the study of potential critical failures to ascertain their effects on the performance of products. FMECA helps in ascertaining the features of product design, manufacturing, operation and distribution to reduce failures.

To conduct FMECA, the vast experience and expertise from all the departments, such as marketing, design, purchasing, technology production and operation, and so on, is required to define the criticality or importance level of potential problems and suggest necessary steps to reduce them.

The main elements of FMECA are:

  • Failure mode: The most likely failure modes are location and mechanism. These are studied by keeping conditions of operations as the background.

  • Failure effect: To ascertain the effects of failures on the performance of product, process or service, all the undesirable events that have taken place in the system are studied.

  • Failure criticality: The probable failures in various parts and components of the system are studied to ascertain the severity of failure in terms of safety risks, loss of function, etc.

FMECA can be applied to almost all stages of production, be it design, development, production or operation. However, with the aim of preventing failures, FMECA is most desired at design stage to identify and remove errors. FMECA also involves various steps which are defined by special FMECA pro formats.

The following steps should be performed to conduct the FMECA:

  • The components of product or systems to be analysed are defined.

  • All possible failure modes of each component are listed

  • The effects of all the failure modes on the performance of product or system are ascertained.

  • A list of all the probable causes of each failure mode is created.

  • Now, all the failure modes are rated numerically on the scale of 1 to 10. For this, experience and reliability data should be considered for determination of the values for
    • P: probability of the occurrence of each failure mode

    • S: seriousness or criticality of the failure

    • D: difficulty in identifying the failure before the product is used by the customer
  • Based on the values obtained in the previous step, criticality index or risk priority number is calculated using the formula C = P×S×D

  • At last, a plan is developed indicating the corrective actions to be taken, the person or department responsible for implementing the actions and expected completion date of the implementation.

Leave a Reply