Showing posts with label Methodologies. Show all posts
Showing posts with label Methodologies. Show all posts

Sunday, 3 July 2011

Why start maintenance analysis in the middle?

There has been a rash of methods and practitioners out there actively recommending approaches that start in the middle.

By this I mean starting with the existing maintenance regimes and strategies and working their way backwards, then forwards from there, somehow optimizing the whole thing.

My experience with existing maintenance programs is that this is a inherently flawed approach.

Tuesday, 8 June 2010

Fundamental elements of any defect elimination program

We tend to burrow more than most managerial disciplines for some reason. We often get so buried into our methodologies that we cannot see the world around us.

When we implement defect elimination the rooms explode with heated debate over techniques, styles, semantical terminology arguments and so on. All very valid, and often debated by people with more grasp of it than I have.. but it tends to miss the point.

Most, if not all, RCA approaches are designed and honed by people who have been there and done it. They are pretty good, let's face it. And most will probably deliver different paths to a similar result.

All well and good.

The real heated issue, the one that is often forgotten, is that of implementation. How to you set up your organization so that you are able to rapidly find, analyse, approve and implement the results?

First, set up a screen. A production accounting system, capable of tracking the lost production opportunities and reporting on their dollar value.

Second, act on it - do the analysis. (Maintenance first, resist the urge to automatic redesign, etc)

Third, implement it. Increasingly efficient and rigorous implementation pathways. (Ops, Design, Maint, Procurement/Supply)

The first of these elements. Setting up a screening process, or processes, is possibly the most vital. It will decide whether your initiative is proactive, and dealing with issues before they become chronic, or reactive and waiting for the next ambulance to whizz past.

The goal is simple. Set daily production targets. record reasons for variance, and then quantify them in $/Ton sales or gross profit terms. (Whatever you prefer)

Act every day on every incident and trend them weekly to see what is happening over the medium term. Biggest number wins in terms of targeting resources, and the work needs to be well publicized when completed. 

Saturday, 4 July 2009

When is criticality analysis useful?


When used correctly criticality analysis can provide companies with a very powerful tool for ranking their assets, prioritizing their workloads, and for managing their capital spending. 

Unfortunately, in their drive to achieve these sorts of results, many practitioners have regularly misapplied criticality analysis. In fact, one could say there is a cult of criticality out there. Trying incorrectly to use some form of matrix approach to solve every part of their maintenance problems.

In some cases the results are relatively harmless, and the only negative impact is the tremendous waste of time. However, on other occasions misapplication of criticality analysis can produce results that are counter productive, dangerous and provide asset owners with a false sense of security.

I can't tackle all of the reasons why criticality can lead to these sorts of problems here, that would take a full chapter of a book, But there are some clear guides that may help avoid this int he future. . 

1. Always and only at the level of the failure mode. 

It is not uncommon to see "-practitioners" applying criticality analysis at the level of the equipment, assembly, or even at the level of the "principle functions". (Whatever they are)

This practice is not only uninformed, it is extremely dangerous. 

You cannot know the relative importance of an asset unless you know what happens when it fails. 

This means understanding all of the functions, all of the functional failures and failure modes, and all of their consequences. 

Any criticality that is done without going to this level is destined to produce results that are lightweight, inaccurate, and potentially misleading. 

Some great examples of Criticality analysis at work.

1. Prioritization of corrective work orders (Works arising from..)

2. The criticality matrix in RBI, risk based inspection, is always at the failure mode level. 

3. Criticality analyses prior to performing a Safety Instrumented Systems project. This is relatively easy to do. Most safety instrumented systems have only one function, therefore the failure modes are relatively straight forward.

2. Never sum the answers

Comparing operational risks to safety risks is always where this sort of thinking comes unstuck. There seems to be a belief that we always go for the next highest criticality action or activity, when this is actually not true. 

It is also impossible to produce anything (and I have seen a heck of a lot of these now) that truly gives you the capability to compare operational / economic and safety / environmental risks. 

The tactics that is often used (erroneously) is to quantify the scores in every area of criticality, and them sum up all the criticality scores, then we are able to choose the highest, the next highest and so on. Sounds logical right? In fact, it has always been a very intoxicating argument. 

But it is wrong...The results are often that low safety risks get treated before high safety risks because they also carry high operational costs, which catapults them to the front of the line. 

The result? High safety being left in a high risks position. 

The option...

a) Only score the highest one. The first one you come to.

As with an RCM analysis if you decide that the failure mode has an intolerable level of risk of a safety event, then that is how it needs to be managed. It's other consequences in environment or operations do not matter. Safety wins, every time. 

b) Treat each failure mode according to its consequences.

So what do I do? I have an intolerable risk of a safety incident, and a failure mode with $10,000,000 attached to its failure. Which do I manage first?

Always the intolerable safety items. Then the intolerable environmental integrity elements. No need to debate, compare or work through a cost/benefits calculation. 

Safety wins, get it to the tolerable levels. Then environment, get it to a tolerable level also. Then deal with the economic issues. Do not over complicate things. 

Even the HSE out of Great Britain has come out against this practice. 

3. Never as a filter!!! (Ever)

I have seen this applied two or three times now. Once was in the infrastructure industry of the UK, a second time in the electricity industry of North America, and third was an application of software in the mining industry.

The thinking goes something like this.....

Now that I have all my strategies (from RCM) and all my functional tests (from SIS) and all my replacement options (from say Availability Modeling) I now want to reduce all fo the activities to only those that are critical and require our further attention.

This is idiot engineering at it's best. Don't fall for this

The methodologies and approaches explained above will, for the assets they are working on, produce a safe minimum level of maintenance interventions. There is no further room for another layer of "optimization".

These types of approaches are usually developed and applied by people with only a scarce understanding of what asset management is about, and they are fundamentally dangerous. In fact, they are more likely to cause safety related incidents than an approach that does not use this foolish application of criticality analysis. 

4. Prioritize where ever you can do.

I have ranted on this many times. But essentially it is unwise to use criticality analysis to determine which assets should be analysed, or which capital should be spent. Where you can it is far far better to use prioritization methods such as bad actors and AHP. (Which is fantastic by the way)

Good luck.

Sunday, 22 February 2009

What's in a name?

Apparently - a lot!

Like me you have probably been reading a lot about Asset Performance Management (APM) and asking yourself what this is and where it came from. Well I have anyway.

When I started out in the mid eighties it was all about maintenance and reliability. Then Moubray either coined  or popularized the term Physical Asset Management. Complete with a heavy dose of attention to the fact that we operated in a managerial discipline. Not just the job of fixing stuff.

By the time I landed in the UK we had progressed through Enterprise Asset Management, which later came to refer to a system and information management issues (strangely enough) and we had arrived firmly on Asset Management.

I have been fine with all of these terms, but I particularly liked the term Asset Management. It was a huge tent under which you could park everything from taking the reading and greasing the bearing - right through to the economics of 5 year CAPEX forecasts and how to use reliability to influence the bond markets.

In one of the larger studies  I was involved in related to Asset Management I was astounded to find out that I was the only person in the room with any physical asset management knowledge - the rest of them were high level economics gurus. (And they seemed like real gurus to me)

Then I started to hear the term Asset Performance Management. From what I understand, and it is my understanding only correct me if I'm wrong, it was coined by the people at Meridium . (The guys who bought you Enterprise Reliability Management System (ERMS))

It seems to have had an almighty impact because the term is now being used everywhere - in the recent Aberdeen report, by Oniqua and Ivara. (Even IBM offers it but in the small print , not embracing the term fully yet)

For the life of me I have not yet worked out whether this is a software solution only, or whether there is a more holistic and global dimension to it.

From what I have seen - I suspect it is a combination of creating marketing distance from competitors (which has worked exceptionally well) and definitely a software based solution. (As EAM evolved into within the administration and efficiency space)

There doesn't yet seem to be too much real detailed focus on long term planning, high confidence decisions impacting on NPV, nor does there seem to be any real tie into optimizing CAPEX. So either it is in evolution still, or it is another narrow slice of the entire Asset Management picture.

Fascinating time to be observing all of this from the outside, I am very curious to see where it all ends up in four to five years time.

Wednesday, 18 February 2009

RCM Hits the mainstream media

CNNMoney.com carries a news feed today from the recent, and much quoted, Aberdeen report on Asset Performance Management. A new buzzword, (Presumably because physical asset management wasn't good enough?) that seems to be sweeping the industry and capturing a lot of attention. 

The headline is "Best-in-Class Realize 22% Reduced Unscheduled Asset Downtime as Compared to Laggard Companies". It goes on to quote that leading practices include "invest in Reliability Centered Maintenance (RCM) and Condition Based Maintenance (CBM) to proactively monitor and manage asset performance."

For many years I have been annoyed with consulting firms who are quick to trot out the statement "World Class", for that reason I really like independant analysis like this. Independent researchers who, by themselves, are able to provide indications of leading practices without any alterior motives.

Thats the goal of consultants, and that is the goal of this blog - to create, find or spread leading practices. Not to make them up to suit comemrcial goals at the time. Also, you won't read any reference here to "World Class". The only use for that term is as a tool for separating clients and their money.

You can access the report here, I have used Aberdeen a lot over the years and I recommend them to you as a trusted source of research. (By me anyway)


Sunday, 30 September 2007

Reliability-centred Maintenance and HAZOP – Is there a need for both?

Our guest blogger is Mr Stephen Young, a Founding Director of The Asset Partnership.

He holds a Bachelor of Engineering (Electro-Mechanical) and Graduate Diploma in Asset Management. 

He is a Chartered Professional Engineer. Stephen has spent more than twenty-five years in field of maintenance and asset management with particular experience in Power Generation and Distribution, Water Utilities, Petrochemicals Food Processing, Brewing, Wine and Spirits, Mining, Defence, Manufacturing, and Printing.

Hazard and operability (HAZOP) analysis has a well-deserved reputation for systematic and thorough evaluation of industrial hazards with the safety, environmental, and economic benefits far outweighing the cost.

Developed by ICI in the 1960s as a form or “what-if” analysis, HAZOP has undoubtedly made the industrial world a safer place by identifying credible incident scenarios that had, or would have had, a significant impact on safety and operational capability.

HAZOP review teams apply an agreed set of ‘guide’ words to identify possible deviations Typical guide words include NONE, MORE, LESS, AS WELL AS, REVERSE, BEFORE, AFTER, EARLY, LATE, etc. Possible causes and consequences of the deviation are identified, along with any safeguards that may exist. Actions that are required (such as design changes, need for procedures, etc) are documented and assigned for action.

The HAZOP review team typically includes operators, designers, technical specialists, and maintainers in addition to the HAZOP facilitator and needs (or must develop) a detailed understanding of the system under analysis.

For most HAZOP studies, 50-60% of the recommendations address product quality or plant operability issues and not safety or environmental concerns. The driving benefit is for engineering design teams to identify potential problems on paper during design rather than in the field during start-up.

Risk management through a HAZOP analysis, is usually through the addition of further technology such as protective devices to initiate some action to avert the consequences but the process does not necessarily address the possibility of the installed projective devices failing to operate as intended. There is no methodology within a HAZOP analysis to develop a rigorous and defensible strategy for managing the reliability of protective devices, or indeed the reliability of the process equipment at all.

The leading maintenance strategy development tool is SAE JA 1011 compliant Reliability Centred Maintenance (RCM), which applies a structured decision logic to the outputs of a Failure Modes and Effects Analysis (FMEA). This technique has been demonstrated to significantly improve plant safety, reliability, and maintenance cost-effectiveness.

RCM was developed in the aviation industry in the late 1960s to address the inability of traditional maintenance programs to effectively manage aircraft reliability. The process is now applied throughout industry.

Today’s RCM uses a multidisciplinary team of people including a facilitator, operators, craftsmen, and other specialists as required to review a defined system.

The team:
  • Defines the operating context of the system, including a description of what the system does and a list of the equipment within the system
  • Defines all the functions of the system including primary functions, secondary functions (e.g., containment, contamination prevention, protection, economy, efficiency, support, appearance, environment), and protective functions (e.g., alarms, interlocks, devices for relieving abnormal conditions)
  • Lists all the failure modes and effects for each function
  • Uses a decision diagram to guide decisions on how to best maintain the function of the equipment to minimize the risk of equipment failure or process malfunction.
  • For equipment failures which can not be prevented from failing, appropriate strategies are developed to minimize the impact of failure.
As a result, the resultant risk management strategies depend not only on the failure characteristics of the maintainable item, but also on the consequences of the failure in terms of operational performance measured in cost, product quality and customer service but also on safety and environmental impact.

The RCM process is unique in the manner in which it both recognizes and manages hidden failures. Many components, particularly protective devices can fail in such a way that no one knows that the item has failed. These failures, known has hidden failures, have no consequence until some other failure also occurs which requires the device to operate such as a high-high level switch (the process normally never reaches the high-high level, so there is no way to tell if the switch works without testing it)

With the inclusion of appropriate HAZOP guide words into the Functional Failure assessment of an FMEA/RCM analysis, the aims of both analysis processes can be satisfied with the minimum of effort.

SAE JA 1011 compliant RCM rigorously assesses the possibility of protective devices failing to operate as intended and develops functional checks of devices based on a proven algorithm considering the probability and consequences of failure. Further, RCM uses a rigorous, defensible and audit-able process for developing the most appropriate strategy for managing the reliability of assets.

FMEAs can be either component or system based, and the modern evolution of RCM as developed by John Moubray, uses a process based functional FMEA analysis. John Moubray’s RCM 2 clearly identifies the process functions and the failures which can affect the performance of that process. Identified failures include equipment malfunction, equipment degradation, human error and inappropriate or incorrectly designed or installed plant. The RCM 2 process then seeks to find the most appropriate method to manage each one of those risks.

There is a great similarity between the HAZOP and the preliminary stages of an SAE JA1011 compliant RCM analysis. The sequential application of HAZOP and RCM analyses within an organisation therefore wastes precious resources for no benefit. But with a very subtle modification, a robust SAE JA1011 RCM can satisfy the requirements of HAZOP but, a HAZOP in isolation, is unable to generate the same outputs as an RCM analysis.
The Asset Partnership
The Asset Partnership is one of Australia’s leading Asset Management consulting organisations with offices in Sydney, Auckland, Perth. We specialise in partnering with clients make efficient and effective use of their investments in physical assets in the most demanding of environments by:

  • Maximising the sustainable capability of existing assets.

  • Reducing asset ownership costs and risks
  • Optimising capital outlay


Friday, 21 September 2007

The Strategic View by Ron Doucet

Guest columnist Ron Doucet is a reliability professional with a long career of driving companies towards operational excellence. He is an Aladon trained RCM practitioner, frequent conference speaker, and currently holds a senior asset management position in the mining industry of North America.

Welcome to the inaugural launch of The Strategic View column. This monthly column will explore the topic of asset management improvement by examining causes of unreliability, and the realities of addressing the underlying issues in an operating environment that most of us experience each and every day.

The causes of unreliability are almost endless starting at the procurement/engineering stages right on through to how maintenance is performed, how the equipment is operated and eventually disposed. Some issues are organizational, behavioural and cultural.

With the safety performance and profitability of so many companies being dependent on the performance of their assets, it may be surprising to some that asset management principles for the most part have not been anchored into companies’ governance policies. For those that have adopted clear asset management principles, the realities of making the transition are more often that not misunderstood, thus making the implementation of good asset management strategies and the associated benefits less likely to succeed and be realized.

Making successful improvements in an operating site requires good change management and practices. This applies to all changes including changes to the maintenance tasks, operating practices, change in roles, changes in SOP’s, mission statements, procurement policies, organizational changes, spares policies etc.

Early in my asset management career, the late John Moubray once told me that the secret to change management is “to change the way people think and then apply the changed thinking to doing something different”. Circumnavigating the world is unfathomable if the world is thought to be flat. Likewise, implementing maintenance improvements at all level will be unsuccessful if the accepted truth is that the function of maintenance is to “fix things”.

We procure assets for what they do, not what they are. Yet too often, the focus of maintenance right after the acquisition is focused on maintaining what the asset is as opposed to maintain what the asset does.

Maintain means to “cause to continue” or “to keep going”. As it pertains to assets. The primary function of maintenance is to cause any asset to continue to do what its users wants it to do. In other words the function of maintenance is to preserve the function of the asset, not to “fix’ the asset.

It is this changed thinking from preserving the asset to preserving the function of the asset that will allow real bottom up and top down maintenance improvements to take hold. How to get everyone to understand this will be the subject of another article.

As mentioned earlier, this column will focus on the implementation of sustainable asset management improvements and I will refer often to the concept of maintaining the function of assets and the need for this to be a common understanding at the level at which the maintenance improvement is sought.

With your feedback, comments, suggestions and questions I hope to be able to address some of the main issues affecting equipment performance and generate some real world solutions.

Thursday, 20 September 2007

Early indications of leading practices

After only a few short months running online surveys we have already been able to get some pretty interesting interim results on the environment of modern asset management, its business drivers and some of the leading practices that have helped companies to become Leading performance in the area.

Our second survey on Asset Performance Management can be found here and is scheduled for completion in late October. All respondents will receive a copy of the survey results.

So far we have around 160 respondents representing 93 different companies.

Geographically these come from mainly the USA, Australia, the UK and the Middle East. By sector they range from petroleum refiners, mining companies, transport companies, rail land water utilities.

Overwhelmingly all Leading Performers have indicated that the primary drivers for their asset performance management initiatives is one of increased profitability. And the strategies they employ show this to be the case.

Interestingly the second driver among all of those completing the survey is the need for regulatory compliance in asset management. Reasons included financially regulated markets (such as the UK utilities industry), Contractual constraints (such as in large outsourced maintenance companies) and changes in water and electricity regulations in the USA.

The third most influential driver for businesses throughout the world has been that of reducing the risk of safety incidents from asset management. Key reasons for this driver were stated as including recent explosions at the BP refinery and the Buncefield refinery in the UK (40%), changes to laws around the world (30%) and other high profile cases such as Westray in Canada and the Hatfield Train Disaster in the UK. (27%)

One of the criteria we have identified for the Leading Performers is a level of asset availability exceeding 95%. Our studies so far have shown that of all initiatives those of RCM and Planning and Scheduling.

As can be seen 40% of the Leading Performers consider their companies to have fully integrated the RCM approach, while 33% of the Lagging Performers (<85 are="" br="" even="" implement="" not="" planning="" rcm.="" to="">
This is further supported by Mid Performance companies. All of whom are either planning to implement, in the process of implementing, or in the early stages of managing the process of RCM.

Other results are equally compelling with Leading Performers demonstrating consistent downward trends in operational budgets in direct correlation to their maturity with planning and scheduling processes along with other useful insights into what leading performers are doing to control risk, maximize profitability and sustain momentum in their change programs.

There are also some interesting trends in terms of technology providers among our survey group.

If you are interested in participating the Asset Performance Management survey can be found here and is scheduled for completion in late October. All respondents will receive a copy of the survey results.

During October we will be running a series of Deep Methodology Surveys for RCM, RBI and RCA in particular.

Sunday, 19 August 2007

Myths of RCM: Myth 5 RCM requires large amounts of data before it can commence

This is a paradigm that is slowly starting to fade away. However, I still regularly come across maintenance managers and directors who make statements such as “We are not ready for RCM yet because we do not have the data to support it…”, or “we are not in a position to get the benefits from RCM because our data is not up to scratch…” The underlying belief is that RCM is not possible without a good base of data to build from.

Like many other myths, this one is not only false and misleading, but also dangerous.

If RCM needs data, then we cannot start until that data is available. The problem here is that before we can have failure data with which to take decisions we need failures!

Obviously it is an unethical practice to “crash a few more” assets just so we can work out how to manage them; particularly when the consequences of failure could be significant financial costs, environmental damages or even risks to life and safety.

In fact, Resnikov, the mathematician who wrote “Mathematical Aspects of Reliability-centered Maintenance” stated that:

“The Reliability-centered maintenance program elevates extends these philosophical views to engineering by elevating the unobtainability of information to a positive principle”


Meaning that through the rigorous nature of RCM, the rules contained within the decision diagrams and the underlying concepts that have come about in SAE JA1011, we are able to take decisions regarding asset management with an absence of data.


The figure above shows my experiences in starting asset reliability programs of all different throughout the world. In general, we can expect to be taking decisions based on only 30% data as a maximum.

For those of you familiar with trying to perform reliability analysis or predict component failure, you will recognize that this is actually a generous figure! I have started many analysis, RCM and otherwise, with close to zero data available for analysis!

Therefore, Reliability Centered Maintenance can commence with an absence of data and it is actually our duty ethically to try to deal with equipment failure proactively rather than waiting for undesirable events to occur.

In fact, one of the side effects of a properly targeted and implanted RCM program will be a reduction of failure data to analyze because we are reducing the number of unplanned failures. However, as we will review in Myth 6, an RCM program will help to get better data through failure management strategies.

Sunday, 3 June 2007

RBI Case Study from the 1990's

I thought this may be of interest. It is a small section from my new book "Asset Resource Planning" which I am expecting to be published within the next few months. It looks briefly at some of the benefits gained from the implementation of RBI into the Altona plant in Victoria Australia.

The remarkeble thing about this case study is it occurred at the same time that the American Petroleum Institute was also developing the API580 and 581 standards for RBI.

In 1995 the Boiler and Pressure Vessel Regulations, which mandated inspection of pressure equipment at set intervals by a government inspector, were repealed, and the performance-based Plant Safety regulations were introduced under the existing Occupational Health and Safety Act. This presented an opportunity to optimize the inspection turnaround program.
The new regulations required owners, manufacturers and designers to identify hazards, assess risk and mitigate risk where required. Inspection intervals are no longer mandated.

Coincidentally, the American Petroleum Institute (API) initiated a sponsored program in 1993 to develop a risk-based inspection process. The Altona project, in Victoria Australia, was developed entirely separately from the API program. The two processes contain a number of similarities, but in key areas are quite different.

After taking a decision to focus solely on vessels during the initial stages of the project, an approach was used to identify failure modes with a “wear out” characteristic based on ASME CRTD-Vol.20-1.

The semi-quantitative approach that was finally agreed upon and the failure distributions were plotted and the frequencies calculated. Inspection plans were then optimized using cost analysis techniques.

In the project documentation there was an admission that the process was not useful for managing those failures that were known to have no relation to time, or random failure modes. Refractory spalling in process heaters, for example, were delegated to methodologies such as RCM which are better at managing such issues.
Key results from the project are as follows:
  • The project provides a valid methodology for directly relating risk reduction to inspection interval.
  • Increase in inspection intervals on all processing units, resulting in less turnarounds.
  • Reduction in turnaround scopes (measured in number of vessels opened) by 30-50% over previous statutory requirements
  • Reduction in the costs of turnaround maintenance and lost production by approximately AUD 2.5MM/year (US$1.8MM/yr.)
  • The process improves the understanding of risks associated with vulnerabilities and therefore better targets inspections.
The processes increases focus on on-stream monitoring and inspection in order to decrease the uncertainty in time to failure and reduce turnaround scope.

The Victorian regulatory authority has accepted the process as a valid method of identifying hazards, assessing risk and managing those risks in accordance with the Plant Safety regulations.

This is a very comprehensive case study that shows the ability of to or more methodologies to be applied together. It also lists a range of tangible and non-tangible benefits, which hen taken together provide a sound basis for this approach to ARP.

This particular application of Risk based Inspection shows a strong blend of developing effective maintenance and inspection regimes through the application of risk, and then optimizing this to produce the most efficient means of executing the strategy.

Sunday, 27 May 2007

The RCM Standard – 8 Years on…

For those of you involved in the field of Reliability-centered Maintenance you will no doubt be familiar with the publication in 1999 of the RCM standard by the SAE.

The Standard is part of the ongoing evolution of the asset management discipline and provides asset managers and maintainers throughout the world with an instrument, created by an internationally recognized standards producing body that can be used to determine what Reliability-Centered Maintenance is and what it is not.

This capability is as important today as it was when the standard was originally produced in 1999.

For those of you unfamiliar with the RCM standard it does not provide companies with a process to follow, what it provides is a set of minimum criteria that must be followed in order for a process to be labeled as RCM.

So it is able to be implemented in any fashion that is required, and with any number of resources that are required. It is a common falsehood to state that standard compliant RCM is resource intensive to implement. Implementation is not even covered within the standard!!

This is just one of the many untruths that are expounded by those with commercial interests in the standard not being adopted. Fortunately for physical asset managers the world over these companies are few and far between these days.

Since the Nowlan & Heap report was published, a great many processes have emerged that claim to be RCM. Many of them bear little or no resemblance to the process described by Nowlan & Heap.

This became a cause of grave concern to many organizations. In particular, the US Naval Air Command (Navair), which was one of the sponsors of the original N&H report, found that some vendors were using all sorts of weird and wonderful processes which they described as "RCM" to develop maintenance programs for equipment that they were selling to Navair. (The history of RCM in the US military has been ably described by Dana Netherton, chairman of the SAE RCM committee, in articles that appeared in maintenance journals in Australia, the USA and the UK.)

Although the RCM standard has been widely adopted in the USA and Europe, there are still companies, consultancies and software vendors that continue to label their products and services RCM even though it is not true to either the intentions or the practices outlined within the original RCM report.

In practice this often means that maintenance strategy formulation methods are being sold as RCM when they may produce results that are at times counterproductive, at times even dangerous, and nothing to do with reliability centered maintenance at all.

This is obviously not in the best interests of the companies purchasing such software and services, it is not in the best interests of the professionals faithfully being trained in such processes, and it is not in the best interests of the discipline of asset management as a whole.

For any of you out there actively involved in purchasing, implementing, or working with RCM software, services or other products I urge you to review a copy of the standard and the guide (available at http://www.sae.org/), study it, and then apply it to your processes in place.

These are some common deviations that I have witnessed first hand over the past ten years or so, and often sound the alarm that things are not as they should be for me personally anyway. I hope they are of use to you:
  • Is the operating Context Defined?
  • All primary / Secondary functions defined?
  • Writing of Function Statements correct?
  • Performance Standard defined?
  • All functional failure defined?
  • Separate Hidden from Evident?
  • All scheduled tasks comply with technical feasibility and worth doing criteria?
  • All formulae logically robust and available for approval?
  • Detective maintenance tasks (For hidden failures) take into account the need to reduce the probability of the multiple failure of the associated protected system to a level that is tolerable to the owner or user of the asset?
RCM remains the best form to develop the minimum levels of maintenance for a given level of performance and risk. There are many strategy formulation methods that often take a long way around trying to do similar, but to date none that I have seen is able to be compared to RCM for speed of implementation, rigorousness of approach and the best management of risk and cost effectiveness.

If your company wants to get the benefits of RCM, and you believe you have bought a system or process to do so, then I suggest that you check whether or not it really is RCM. (Or better yet stipulate it within the RFP when it goes out)

All the best,

Friday, 18 May 2007

Myth 4. RCM does not support whole-of-life asset management

This posting continues the 10 part series on the 9 deadly Myths of RCM. Continuing to question commonly held beliefs throughout the maintenance engineering community regarding the application, implementation and evergreen processes involved in managing physical assets. Next Posting - Myth 5: RCM is Qualitative not Quantitative.

One of the first things taught by RCM practitioners in training courses around the world is the nature of failure, and the assignment of routine maintenance tasks. From the original RCM report we are provided with four basic routine maintenance tasks.
  • Predictive Maintenance (PTIVE) – A task aimed at detecting the onset of failure or the potential failure. Often referred to as CBM or On-condition Maintenance the goal is to ensure that the occurrence of failure modes that have undesirable consequences are predicted so that they can be mitigated through planned activities. Within RCM PTIVE tasks are the preferred option.
  • Preventive Restoration (PRES) – A task to restore a machines original resistance to failure based on some measure of hard time. (Such as calendar hours, hours run, or liters pumped for example) This task is generally applied to failure modes that can be restored without the need to replace the asset. Examples in this area include; re-machining, cleaning, flushing, sharpening, re-positioning, tightening and adjusting. Often PRES task can include calibration where this is done on a hard time basis. Within RCM PRES tasks are the second preferred option.
  • Preventive Replacement (PREP) – A task to replace a physical asset in order to restore its resistance to failure. As with PRES tasks these are also hard time tasks. Common examples of PREP tasks include greasing bearings, changing oil filters and oil (if done on a time basis), and routine light bulb replacement (often but not always). Of the standard routine tasks PREP is the least preferred within an RCM framework.
  • Detective Maintenance (DTIVE) – These are tasks that are done to detect whether an item has already failed so that action can be taken. These tasks are only used with items that have hidden functions. For example with protective devices such as circuit breakers, stand by pumps, lanyard switches on conveyor systems and High-high level switches. DTIVE tasks are only used within the four categories on the Hidden side of the RCM decision diagram and are not referred to in the four categories on the evident side at all. DTIVE tasks include proof testing of critical instrumentation and the occasional running of stand by pumps. Although often associated with safety related failures this is not always the case. Within RCM it provides the last line of defense for routine maintenance when a failure mode cannot be predicted or prevented.
The four routine tasks within RCM are generally well known and most of us with any exposure to this area of activity will have at least some understanding of them. This does leave out age exploration as this is a task aimed at increasing our level of knowledge rather than predicting, preventing or detecting a failure. It is relevant in terms of failure management policies but only so that we can decide the frequency of one of the tasks above.

But how does this help us when we start to look at whole-of-life asset management?

RCM provides the framework to define not only the 4 routine tasks, but also to define the three additional corrective tasks and calculate their expected frequencies.
For example, in a predictive maintenance task the PTIVE task is the task that we are applying at a given frequency in order to detect he onset of failure. However, there is also a corrective task. Once we have predicted that a component or asset is going to fail we need to plan, resource and execute a task to correct this situation. This is called the Predicted Task or PTED. (See Figure 1)

Figure 1 - Tasks Involved in Predictive Maintenance

Within the hard time tasks there is only one task, that of Preventive Restoration (PRES) or that of Preventive replacement (PREP). However, in Detective Maintenance tasks (DTIVE) there are also corrective actions. Once we have determined that a detective maintenance task is required, RCM enables us to derive a frequency based on managing the risk of a multiple failure to a tolerable level. The Detective task (DTIVE) is then performed on a routine basis to detect whether an asset has failed or whether it is still working.

Regardless of whether the asset is a switch, a circuit breaker, a sensor or a stand by pump, at some point we will detect that the asset has failed. This means that at some point there will be a corrective task, the Detected Maintenance task, which will normally be a replacement or repair of the failed asset. As with the Predictive Maintenance task (PTIVE) we have allowed this to happen because it is the best failure management policy available to us and wee are able to manage the consequences of the corrective task.
Figure 2 – Tasks involved in Detective Maintenance

The last of the corrective tasks that we can derive from a standard RCM analysis is that of run-to-failure. In this failure management policy we have eliminated the likelihood of either safety or environmental consequences and have determined that the most cost effective strategy is to allow the asset to fail. Any other action would cost more to carry out than to maintain the asset itself. In this case the only task that we need to consider is the Run-to-Failure task itself which is obviously a corrective action.
Figure 3 - Tasks involved in a Run-to-Failure Strategy

So once a comprehensive RCM Analysis is completed for an asset-system or an asset, it can include up to 7 planned tasks. 4 are routine tasks, 3 are corrective tasks, but all are proactive tasks. All are the result of careful decision making regarding maintenance policy and strategy. This allows us to build what is known as a Proactive Whole-of-Life Model.
  • Predictive Maintenance (PTIVE) - Routine
  • Predicted Maintenance (PTED) - Corrective
  • Preventive Restoration (PRES) - Routine
  • Preventive Replacement (PREP) - Routine
  • Detective Maintenance (DTIVE) - Routine
  • Detected Maintenance (DTED) - Corrective
  • Run-to-Failure (RTF) – Corrective
The whole of life model is produced through calculating the resource burden of each individual task, then calculating this by the frequency of the task until the end of life event or threshold tine period. In the case of the routine tasks we can be pretty sure that our estimates are correct, however in the case of the corrective tasks these are often estimates based either on manufacturer’s data, our own maintenance history records and the experience of the people involved in the analysis.

As time goes on we begin to collect data that will enable us to become more accurate in our predictions.

For those familiar with attempts to calculate whole-of-life costs for assets there has always been a level of doubt regarding how to manage corrective tasks. This often results in some form of average of historical costs with an arbitrary cost reduction thrown in for improvement. When practiced rigorously,

RCM and the Proactive WoL models that it produces enables accurate lifecycle cost management for a given level of performance and risk.

Tuesday, 24 April 2007

Myth 3. RCM is only for rotating equipment not for static equipment

This post is part of a ten-part series aimed at increasing the awareness of RCM as a fundamental element of asset management. We are driving to stop poor practice being thought of as leading practice!


No matter where you travel throughout the world you hear this statement. The result of failed implementations and smart marketing, this statement has allowed several splinter methodologies to spring up all over the world. Risk Based Inspection, Safety Instrumented Systems, and more recently Risk Based Maintenance are all derivative methodologies that have sprung up, each with an extremely narrow focus.


For example, RBI is focused only on the containment function whereas SIS focuses purely on the Hidden safety functions and so on. The appearance of these splinter methodologies has also given rise to entire industries, each one protecting their commercial interests by promoting very restricting barriers to entry.


First, why is it a problem? Regardless of the technical integrity of each of these methods, there are some substantial problems caused by the implementation of splinter methods.


For instance; when implementing RCM the analysts need to focus very closely on the expertise and experience of a range of methodologies and disciplines. This immediately allows for a fully cross functional approach to be developed to the management of the physical asset base. Operations, instrumentation technicians, maintenance mechanics, maintenance electricians, equipment inspectors, safety and management are just some of the managerial disciplines that could become involved in an RCM analysis.


There are many benefits of a cross functional approach, but the principle advantage is that all stakeholders are able to be involved in defining the functions and failure management strategies for a given physical asset, the secondary advantage is that management disciplines begin to work across the artificial barriers created within many companies, contributing greatly to the wider task of asset management.


This ensures a focus on all of the functions, and on their impacts on other areas of the asset, rather than focusing on the asset from a one dimensional point of view. Within RBI, for example, a lot of effort is rightly spent on analyzing the degradation rates of metals, using generally corrosion and operations expertise. Driven by inspectors, with a lot of input from closely related disciplines, the output is often predictable.


An inspection plan, which does not identify itself as maintenance, which is often managed, scheduled and executed via different resources than maintenance plans, reinforcing artificial barriers within the workforce regarding the management of the physical assets. If you look at it on a larger scale you get inspectors permanently separating themselves from maintainers, certification processes that only certify inspectors rather than including others who have been trained in reliability techniques, and it all becomes an effort in self preservation and protection.

Yet this is only a small part of RCM.

From the very beginning RCM has included a focus on static equipment; in fact chapter 9 of the original report was titled “RCM Analysis of Structures”. It recognized that with structural items almost all functional failures will have an impact on safety, meaning that most of the failure modes fell directly into the Safety consequence category of the decision diagram. However, it did not restrict itself to the containment function only, recognizing that structural and fixed assets often have more functions than are readily evident.

It also recognized that failure in structural items meant only one of two possible routine maintenance outcomes;
  • On-condition inspections for all items, and

  • Preventive replacement for safe-life elements
The focus of RCM in structures is on managing the “fatigue life” of the asset, incorporating all of the possible causes or accelerants of fatigue such as load changes and variance and corrosion, focusing directly on areas that are often described as not suitable for an RCM analysis.


In summary, RCM as it was originally intended is an extremely adequate tool for the analysis and management of structural and fixed assets such as vessels, piping, civil structures and supports. It allows these assets to continue to be analyzed within the multi functional framework provided by RCM and allows for the creation of detailed and concise inspection plans as part of the larger asset maintenance plan.

Monday, 2 April 2007

Myth 2. RCM should only be applied to critical assets

The advice to apply RCM to only critical assets, offered by many consultancies, is an attempt to counter the effects of Myth 1. This goes to the heart of why these myths can be counterproductive, instead of dealing with the issue of inefficient implementation; focus often turns to tampering with the method itself.

There are two issues here that need to be dealt with; first what exactly is criticality and second what are the benefits of RCM over a larger asset base.

Criticality is probably one of the most overused terms within the field of asset maintenance. At its core it is an attempt to determine the relative importance of assets within a company, plant or process.

The reasons are many and include sequencing of assets for analysis and other improvement efforts, the prioritization of corrective works in progress, and the determination of which capital spending.

However, with a lack of an agreed upon definition for criticality we see a range of methods, used in a range of ways, all of which clouds the issue relating to criticality and relative importance assessments.

So what is more important? An operational pump with no back up that will take out processes at a rate of $10,000 per hour once it fails; or a hydraulic pump that closes a gate on the entrance to the plant? And… there is an additional entrance to the plant.

It appears to be an obvious question, on the face of things it would appear that a loss of $10,000 per hour wins hands down.

But how about if the failure of the relief valve on the hydraulic pump, under fault conditions, caused the gate to close with fatal pressure, creating a potential hazard to life?

Now the entire dynamic changes, after detailed analysis the pump may still be more critical, or maybe not. The point is that you cannot determine true criticality without understanding all of the reasonably likely failure modes, and their impacts, of the asset under consideration.

This is not an isolated example; many analyses of “non-critical” assets have produced critical failure modes.

In a recent evaluation of pumping stations for a European utility many of them were classed as high criticality due to the populations and volumes that they managed. On review it was found that due to the gases that were present at some of the lower criticality sites they actually had consequences of failure far higher than some of their “more important counterparts.
It is obviously inefficient to analyze all assets to a failure mode level in order to determine which should be included in an RCM analysis. But when it is done at an asset level, rather than at a failure mode level, is often a prioritization approach of the primary function of the asset; or it is based on guesses and assumptions.

Even so, prioritizing assets based on their primary function and its relation to corporate objectives is a perfectly valid and justifiable practice; even if it is not based on real criticality.
So should RCM be applied to only high importance assets?

There are many documented and reported benefits of a rigorous implementation of Reliability-centered Maintenance. However, one that everybody seems to agree on is that where there is an existing maintenance schedule in place, it will provide a dramatic reduction in routine tasks. (Moubray quoted this as up to 70% in some cases)

If we restrict the application of RCM to assets of high importance only, then we are ruling out a reduction of routine maintenance over the remainder of the asset base.
As this series has already illustrated, the impact of many assets cannot be established until their failure modes and their consequences have been identified. So by restricting the RCM process we are also leaving assets that may have potentially large scale failure consequences in the areas of safety or operations.
The benefits of RCM are well established and there is no need to go through all of them here.
In brief, they include;
  • reduction of levels of risk of failure,
  • increases in cost effectiveness,
  • increases in effective labor utilization,
  • turnaround scope planning, and
  • feeding sophisticated techniques on whole-of-life asset management and risk-distributed budgeting
If RCM can be implemented in a rapid and beneficial manner, then any approach that restricts the scope of the analysis project also restricts the value we are able to get out of the physical asset base. Does this mean RCM should be applied to every asset? Probably not; but it should definitely be applied to the majority if it is cost effective to do so.

If this article was of interest to you I hope you would recommend it to others within your operations. Please feel free to send me an email regarding this or any other themes on this site.

Sunday, 1 April 2007

The 9 Deadly myths of Reliability-centered Maintenance

This article is part of a ten part series of posts looking to expose some of th emyths surrounding RCM in modern asset management. Unfortunately as the method becomes more and more popular the list of detractors gets longer and longer. Often with the result of making poor practice common practice.

Since it was popularized at the beginning of the 1990’s, Reliability-centered Maintenance (RCM) has had its fair share of detractors as well as supporters. While some objections have been justified, many are the result of misunderstandings, misinformation, or misapplication of the concepts and techniques.

In the past many approaches with little or nothing to do with the original report by Stanley Nolan and Howard Heap, were often promoted as RCM processes. This contributed greatly to some of the confusion around today. However, with the publication in 1999 of the RCM Standard, SAE JA1011, companies now have the means of determining whether a process is or is not an RCM process prior to implementing it.

Despite the publication and widespread adoption of this standard the “noise” regarding RCM has continued. Often this has been the result of bad prior implementation experiences, but in some cases it is a deliberate effort by those with commercial interests in the area.

This has had three, mainly negative, impacts globally;
  • Many companies who could have benefited greatly from implementing RCM have been discouraged from doing so.
  • In an attempt to continue obtaining some of the benefits offered by implementing rigorous RCM, many streamlined, or cut down versions have appeared. While some of these do achieve some of the benefits of RCM, some of these methods are actively counterproductive and even dangerous.
  • Lastly, many “new” techniques have appeared in areas where RCM could easily have been applied with great effect. This has served to reinforce artificial work barriers, as well as to create even further disconnection in asset management approaches.
Many of these “myths” have created the unfortunate situation where inefficient and sometimes dangerous approaches are accepted as leading practice.

The intention of this paper is to scrutinize some of the myths and legends that have sprung up around RCM.

Myth 1. RCM requires a lot of resources to implement and maintain

This is by far the most common of the statements made regarding RCM. Unfortunately it has a basis in fact.

Since it was popularized in the early 1990’s it has been sold to the asset maintenance community as a method that could only be properly implemented via a facilitated group. This required many of the leading people to be off line, sometimes for weeks at a time, running through the seven questions that make up the method.

In fact, when people talk today about “classical” RCM they are often referring to this mode of implementation, rather than any specific methodology. In fact, this paradigm alone is responsible for many of the streamlined versions of the method that exist today.

Fortunately, it is totally false.

That RCM requires the input of knowledgeable professionals for the assets under consideration is not in question. It is ludicrous to seriously consider that one person, no matter how expert, could have all of the information to perform an accurate analysis.

But there is no way in the lean companies of today that the most knowledgeable professionals can be taken offline to sit through an analysis, let alone an entire implementation project. In fact trying to continue with this outdated practice usually leads to:
  • Less knowledgeable people being assigned to the team, or no representation at all. Watering down the rigorous nature of the analysis.
  • Expertise for the analysis being limited to the professionals within the room, rather than the wider group of professionals existing in the outside world.
  • Large period of inactivity and boredom by participants when they are working through areas not related to their expertise.
Of all the above problems with this outdated approach it is the final one that frustrated me the most when implementing RCM in this fashion. Often the functional definitions involve everybody, while failure modes are driven mainly by maintainers and failure effects mainly by operators.

A more practical approach this to use a combination of short duration facilitated workshops, targeted interviews (One on One, One on two etcetera) and make full use of the RCM facilitator in more of an analytical role.

Another tool that modern technology has enabled is that of rapid implementation templates, technique that allows the Analyst to maintain the level of rigor, while speeding up the process dramatically.

The result is a facilitated process, rather than a facilitated workshop. Experience shows that this approach can often reduce the resource requirement by operational professionals by up to 60%, while maintaining the rigor that any reliability process should have.

This is an entirely different role than that of an RCM facilitator and it requires a different more precise skill set. One that focuses on investigation and the application of logic, as well as the skills required for rapid and accurate facilitation.

This is the first in a ten-part series of articles looking to expose some of the myths and legends surrounding RCM. If it has been of interest or use to you I hope you would recommend this page to a co-worker or colleague.

Myth 2. RCM should only be applied to critical assets
Myth 3. RCM is only for rotating equipment not for static equipment
Myth 4. RCM does not support whole-of-life asset management
Myth 5. RCM requires large amounts of data before it can commence