At the core of our company’s values is a simple concept: make Cisco customers successful. We have a vigorous process for measuring customer satisfaction and success, and conduct constant analysis of the competitive advantage they can achieve by using our technology solutions and services. This ongoing focus is designed to anticipate our customers’ business needs today, as well as in the future.
Cisco has been working with individual customers on an issue related to memory components manufactured by a single supplier between 2005 and 2010. These components are widely used across the industry and are included in a number of Cisco products. They are known to slowly degrade over time, and in some cases, have caused products to fail after being turned off and on.
The majority of Cisco products using these components are experiencing field failure rates below expected levels. Recently, however, a handful of our customers have experienced a higher number of failures, leading us to change our approach to managing this issue.
Despite many of these products being out of warranty, Cisco has decided to take a charge of $655m related to the expected cost of managing these issues. We are taking this action to support our customers and partners. This charge was excluded from our non-GAAP financials, as we do not believe it is reflective of ongoing business and operating results.
Our goal to become the world’s #1 IT provider is built on providing a superior experience for our customers in every aspect of working with Cisco. We believe our approach to this industry-wide issue is the best course of action for our customers. Despite the cost, it reiterates that we place their success above all else.
As always, we encourage our customers to contact the Cisco Technical Assistance Center (TAC) if they experience a failure in any Cisco product. If you have additional questions about this process, we recommend reviewing the additional updates at www.cisco.com/go/memory.
If I have active smartnet on my core and firewall products can we request a proactive replacement? These components are the center of my network and will cause major disruptions if we don’t properly plan for a maintenance window.
Adam,
Thanks for the question. I understand what you’re asking and why, so would like to offer a few comments in reply.
We changed our approach to managing this issue after seeing a handful of customers with a higher than expected failure number in their network. However our data shows that the majority of Cisco products using these components are experiencing field failure rates below expected levels. We’ve used industry standard MTBF calculations to determine those rates.
As a result, we are addressing currently available products through a fix-on-fail approach through normal support processes. A technology migration option may also be available to you.
Regards,
Curt
I find it somewhat concerning that we might be sitting on a ticking time bomb and advance replacements won’t be offered for core devices
Phil,
With respect, I wouldn’t characterize this issue the same way.
We understand the components in question (manufactured 2005-2010 from a single supplier), the product families they’re in, the potential triggers for a failure (device power cycling), and some related mitigation options. The majority of affected Cisco products have also seen failure rates lower than expected.
The FAQ on the http://www.cisco.com/go/memory site is just the start, and we’re doing what we can to post product specific field notices ASAP. These will provide more information and help you assess the unique considerations for your network.
Others may be approaching this industry wide issue differently, but we feel that our approach is the best for our customers.
Curt
I’ve been trying to go through the FAQ and other information I can found. Does a reboot count as a power cycle, or would this issue only occur if power switch is flipped and/or power physically removed?
Trevor,
You’ve raised a good question and one I had to check internally with the team. A reboot does count as a power cycle for some, but not all, of the Cisco products using these components. If you wanted to be conservative, I could suggest treating them as the same for the moment. We are working hard to create individual Field Notices for each product which will provide more details.
Curt
Is a more detailed description of the failure mechanism and symptoms available? I would like to know if this is a hard failure that causes the device to solidly fail POST or is it a soft / transient failure that could temporarily disrupt normal operation but allow it to pass a subsequent POST?
Carl,
We know that the components slowly degrade over time, but they require a power cycle to exhibit a hard failure. What we’ve seen is that within the population of components affected by this problem, some components may fail earlier than anticipated, while others will not exhibit failure within the component’s expected lifetime.
In the case of failure, if the component is subsequently exposed to a temperature bake, recovery could occur, but it will be temporary and failure will occur in subsequent operation.
This is a good question and one we’ll add to the FAQs.
Thank you,
Curt
Curt, long time….
Thanks for keeping us all updated. As always, Cisco support is the tops.
When you say “experiencing field failure rates below expected levels”, is compared to advertised product MTBF or based on the expected failure rate of the bad memory?
I have issues with using MTBF for this…
But a comparison must be made with the failure rate for the memory. The product can have other, non-memory related, failures..
JC,
This reference relates to a product MTBF calculated using industry standards which primarily takes into account details like the number of components on a device. We changed our approach to managing this issue because a handful of our customers experienced a higher number of failures. So even though the majority of field failure rates are below committed MTBFs, given what we know of the component weakness, the individual system and customer details are also important.
Curt
3 out 6 WS-C2960G-24TC-L failed after an IOS upgrade last december, a 4th one last wednesday… Replacement for the first 3 have been received but how can we know if they will behave the same way or if they are build with better hardware???
Should we scrap the 6 and replace them ASAP? Never seen this in 20 years working with Cisco device…
No issue (yet) with 350 WS-C3560-48PS….
Daniel,
Clearly this is not right. We always recommend contacting the Cisco TAC if you have an issue, but with replacements on the way, I’m confident you’ve done this already. I will also make sure your account team is engaged.
Curt
Hi Curt,
Is there an age of the devices where the failures become more numerous ? For example the majority of failures have been on devices at least 3 years old or similar . I understand that there will be failures across the age brackets but it would be good to know if there is a figure that gives us an indicator that we may experience failures ,
Cheers,
Marty
Marty,
Thanks for the question. These components do slowly degrade over time, so the risk of failure does increase with the age of the device. Although age alone would be a misleading indicator, we’ve seen failures occurring in devices that have been in service for 3-5 years or longer. I’d also reiterate that failure of the component requires a power cycle to expose a hard failure.
Curt
Thanks for your reply Curt ,
I understand that a power cycle is required , we are currently looking at upgrading the OS on a large part of our fleet of which the majority are on the list of affected devices, so we are now hesitant to proceed with the upgrade although in some cases we are upgrading as the current OS on the devices is going end of Support in the near future. So any updates or guidance will be greatly appreciated
Marty
Marty,
Thanks for your message. Unfortunately a general response here won’t do your question justice, so I will ask you to please connect with your Cisco account team. We are doing what we can to accelerate the Field Notices to provide more detailed information for each of the impacted product categories listed on the site. This information should help you and other customers move forward with greater confidence in the decisions you need to make specific to your network. As always, the Cisco TAC remains available to help if you experience any unexpected behavior.
Regards,
Curt
Can you clarify if the “memory component” is RAM, flash, NVRAM or boot ROM? For a product like a 3750 where all are soldered onto the logic board it won’t matter, but on a NPE-G2, RAM is field-replaceable while the other components are not. This information would be useful to adjust our stock levels of on-site and depot spares.
Terry,
The affected memory component is DRAM, so both DIMM (which may be field replaceable in certain products if accessible) and downboarded components are contained within the affected population.
Curt
I understand Cisco is currently developing a tool which the customer can use to determine if their equipment if affected. Until that tool is available are there any commands we could issue to our most critical devices which would give us an indication it may be affected?
Dennis,
Unfortunately no command exists that provides a guaranteed diagnostic across multiple product families. To your question about a tool, we are still working on developing assessment capabilities for use by internal Cisco support teams.
I understand the Field Notices and other information can’t come soon enough, so I thank you for your patience. Please do check back in at http://www.cisco.com/go/memory as more details become available.
Curt
I am very disappointed by this news. Working in a highly available and highly regulated environment means that IOS upgrades are already sufficiently challenging. Now we have to call TAC every time we do it, just in case. Until all of our affected devices are rotated out that is. Considering the scope of the problem, we’re going to be calling TAC a ton just to do basic maintenance.
I’m not feeling warm and fuzzy Cisco.
I completely understand your frustration Adam. This isn’t welcome news for anyone, but we felt it was better to have people know about this industry wide issue. As we move forward we’ll be offering more information that should help you better understand your own unique situation.
Curt
We would like to assist in the testing/analysis process – to include other related areas at CISCO’s option.
Thank You.
SRG
MS, Johns Hopkins University, Baltimore MD
CSO
Strategic IT Security, LLC
Hi Curt,
Thank-you for all of the information, I appreciate this is an industry wide issue and have found Cisco’s response to have been the most comprehensive so far.
Respecting the fact that we are to engage TAC I just have an initial triage question to help us determine if we’re in scope for this issue: Upon rebooting an affected device, would failure to boot at all be a possible outcome or do the affected systems typically proceed to boot up but exhibit other symptoms?
Regards,
Craig
Thanks for your question Craig. In the event a component failure occurs, this will result in a failure to boot. Other symptoms after boot will be unrelated to this issue.
Curt
This is an alarming issue, and better advanced notice would have helped. This past weekend, we had scheduled maintenance to perform memory reallocation on a set of redundant FWSM’s that are at the heart of a production datacenter. This kind of work required both units in a redundant pair to be rebooted at the same time, so we had scheduled a two hour outage, however both units went down hard due to this issue, and we lost a production datacenter for 10 hours, causing us to miss customer SLAs.
You can have a list of serial numbers of the devices with problems?
Roberto,
Thanks for your question about serial numbers. I’ve asked the team to look into this and understand what options may be open to us. Apologies that I don’t have a direct answer for you right now.
Curt
Since it is related to one supplier. Is there a chance to identify the suspected products by their serial number?
If yes when will the lists be available.
Helmuth,
Please see above response to Roberto. Thanks for the question.
Curt
Hi Curt,
According to the FAQ, the faulty DRAMs were first installed in appliances shipped out in 2005 and removed from inventory in 2012.
1. So what Manufacturing Date should we be looking at?
2. So if we don’t have SmartNet we cannot RMA our faulty Cisco appliances?
Leo,
Thanks for your question about manufacturing dates. We know that the components in question were manufactured between 2005-2010, but the product manufacturing dates will differ by individual product line. We understand the desire for more detailed information and are working hard to provide this ASAP.
On the question of SmartNet coverage, we encourage you to call the Cisco TAC if you experience a hard failure of a device included in our impacted product list (see http://www.cisco.com/web/about/doing_business/memory.html#~impacted). As normal SmartNet and warranty entitlement rules remain in place, this may mean that you will work on any additional cases (eg. out of warranty) with your Cisco account team.
Curt
Hi Curt,
Thanks for the response. So let me summarize your response to my two questions:
1. There is yet no way to determine how Cisco end-users can identify (either via Serial Number or Date of Manufacturer) for the affected products.
2. If my wireless access points, as an example, no longer has SmartNET (but is in the list) and has a potential memory failure, then “there’s nothing I could do” but potentially waste the time of our Cisco SE/AM?
Leo,
1. That is currently correct. If there’s any change, we’ll make sure this site is updated with the latest.
2. I’m confident that supporting a customer (even for an out-of-contract device impacted by this issue) won’t be thought of as a waste of time by a Cisco SE/AM.
I acknowledge your frustration, and want you to know that our teams are working hard to provide additional detail about the impacted products as soon as possible.
Curt
If that is the case Curt, then I recommend someone needs to revise this statement:
“Despite many of these products being out of warranty, Cisco has decided to take a charge of $655 million related to the expected cost of managing these issues.”
The statement above gives me (and anyone else) that products that are out of warranty are covered.
Dear Curtis,
is it possible to identify affected devices using the serial numbers from the Installed Base Report?
Chris,
Please see above response to Roberto. Thank you for the question, and apologies for not having a direct answer at the moment.
Curt
Curt,
We had to do some power work in our Data Centre last night that resulted in a 2811 router being moved to a new circuit.
A simple 5 minute job, but unfortunately we were greeted with a non-booting device.
Upon consoling into the device, and seeing no output on the screen, we power cycled the router again. Once the router started to boot and checking the ECC RAM (one of the very first things it does), it failed with a “Bad RAM” error and “halted”.
I am making an assumption this is the issue described; are you able to confirm? Cheers.
BM,
I apologize for the difficulty you’ve experienced. Given the product type and the symptoms you’ve described, this failure could be a result of the memory component issue. However without additional information (like component age, etc), I’m unable to diagnose this with any certainty. Please contact Cisco Support for help with troubleshooting and next steps.
Thanks,
Curt
BM:
I too had to move 2x 2811 recently, due to power work in the DC, and both showed similar symptoms.
One would not load at all, and the other would load, but display memory-related error.
I am glad that Cisco has IDed the issue, but hope that they can provide serial\date ranges for impacted units; would really like to see proactive replacement of impacted equipment.
Chris
Do you have an internal part number for the bad memory chip?
Rick,
Thanks for your question. The memory components affected by this issue are used in various ways for different products – in line cards or on boards. Although we have a view into how these were used in different Cisco products, but do not have a specific internal part number.
Curt.
I find it somewhat concerning that we might be sitting on a ticking time bomb and advance replacements won’t be offered for core devices
Ali,
Given that the majority of affected Cisco products have seen failure rates lower than expected, our recommended response to this industry wide problem is fix-on-fail.
We’re also very close to releasing more detail through the field notices. We hope that these updates will help you validate our approach as measured and appropriate. It should also allow you to consider the situation with your own network in mind.
Curt
So what happens when my 2811 has failed due to memory issues and because I do not have SmartNet, TAC have said they are not willing to help.
Am I now stuck with a broken device, or is Cisco going to offer a solution?
Edward,
Thanks for your question. If you have a Cisco device affected by this issue (ie. known to have these components and shown to have failed as a result), we have different ways we can help.
Currently available and covered products will be addressed through fix-on-fail or technology migration programs, using normal support processes. End of Sale products will be managed through fix-on-fail, return-to-factory, or technology migration programs. End of Support products will be managed through a technology migration program.
Given that you have already raised this through TAC, I would recommend contacting your Cisco account representative.
Regards,
Curt
Curt,
As the owner of a dating website I want to be able to offer our customers innovative new features such as: 1. Ways of viewing movies, 2. Streaming video, 3. Live chat, and 4. Increased security of personal information.
We don’t own our server and our service is provided through a third party provider.
My question is, what Cisco products would best facilitate these services?
RP
RP,
Thanks for your note and sorry for the delay in responding. As this is a broader question than component memory, I’m going to refer you to one of Cisco’s excellent partner organizations. They should be able to go deeper into your needs and recommend some solutions. To find a partner in your area, you can visit our partner locator tool: http://tools.cisco.com/WWChannels/LOCATR/openBasicSearch.do
Curt
I understand Cisco is currently developing a tool which the customer can use to determine if their equipment if affected. Until that tool is available are there any commands we could issue to our most critical devices which would give us an indication it may be affected?
Alexandr,
Unfortunately no command exists that provides a guaranteed diagnostic across multiple product families. To your question about a tool, we are looking at developing internal assessment capabilities for use by Cisco support teams.
That said, you should also know that today we posted a full list of Focus Products and related PIDs – please see http://www.cisco.com/web/about/doing_business/memory.html#~focus,. We’ve also posted all related Field Notices at http://www.cisco.com/web/about/doing_business/memory.html#~field,. We continue to recommend fix-on-fail for all affected products, but this new level of detail should help as you make decisions related to your network.
Regards,
Curt
Curt
When will you release product-specific Field Notices?
Rick
Rick,
We’ve also posted all related Field Notices at http://www.cisco.com/web/about/doing_business/memory.html#~field,. We also posted a full list of Focus Products and related PIDs – please see http://www.cisco.com/web/about/doing_business/memory.html#~focus,.
Curt.
Hello Curt,
I work at a Managed Services provider that is very largely a Cisco-only shop. We manage several large hospitals that contain many of the products mentioned in the Field Notices. Am I correct in my understanding that no proactive replacements will be issued? To us, this means that hospital equipment may be down for up to four hours after TAC decides this may be the cause of the problem. I assume you can see why this is of great concern. Please let me know how a better plan can be established for our 24/7 customers that need no downtime to prevent loss of life.
-Josh
Josh,
Thanks for being a Cisco customer. It won’t surprise you that we have many customers who manage 24/7 operations and critical infrastructure. Even with this broad customer base, we maintain that a fix-on-fail approach is appropriate for these products, especially given the actual and expected field failures connected to this issue.
While other companies are taking a different approach to this issue, our preference is for transparency and sharing with our customers the information needed to assess their own situation. That’s why we released additional detail yesterday, including comprehensive Field Notices and a recommended Focus Product list. We hope this allows you to make more informed decisions about your network.
If you think there are additional things that can and should be done to address your unique situation, I’d encourage you to speak directly with your Cisco account representative.
Regards,
Curt
Hi,
I’ve a question…I have a number of AP c1250 which exhibit has this problem (unable to boot and not able console in).
During that time we have sufficient spare to replace them and now the faulty one are sitting in the store room. These are already out of warranty. Can I still get a replacement for these faulty AP c1250?
Thanks.
Mic,
I understand that there are some Cisco Aironet 1250 series access points that use the affected memory components. While they are not included on our Focus Product list, we have still posted a Field Notice (see http://www.cisco.com/c/en/us/support/docs/field-notices/637/fn63763.html).
Knowing that I don’t have all the details specific to your situation, the best way to determine replacement eligibility is to call the Cisco TAC and talk through your warranty and maintenance situation. If that doesn’t get you where you need, I’d recommend contacting your Cisco account team.
Regards,
Curt
Hi Curtis,
It is mentioned that The End of Support products will be managed through a technology migration program only.
How do we find out which End Of Support products may have this memory installed?
Thanks
Michael,
Thanks for checking. Right now we don’t have any Cisco products that use the
affected memory components and are End of Support. That said, some will be
reclassified as End of Support later this year, so you will find that some
service contracts won’t be available beyond a certain date. I’d encourage
you to work with your account team, as this will become apparent during the
normal ordering process.
If you’d like to get into the detail of End of Sale / End of Life products,
you find more information online at:
http://www.cisco.com/c/en/us/products/eos-eol-listing.html
Regards,
Curt
Mr. Hill,
Around 2 months ago, one of my 1841 routers stopped booting after this line “Main memory is configured to 64 bit mode with parity disabled”. I thought it was due to a power outage. I opened it and replaced the RAM with a new one and it booted normally.
Is this is the issue you are refering to? or is it something completely different?
Also, i have 11 (2811 routers) manufactured in 2007. Does it mean that all of them will fail one day for certain? or they may or may not fail?
Thank,
Hi Abdullah,
Based only on the brief information you’ve provided, this does not sound like
the same issue. Although the memory component failure is exposed by a power
cycle, it actually results in a hard failure of the device. And as we’ve noted
elsewhere, we are currently seeing failure rates below expected levels for the
majority of Cisco products with these components. Although the components are
known to slowly degrade over time, not all components will fail.
I hope this helps,
Curt
Hi Curtis, I have a question about the Field Notice on Catalyst 6500 http://www.cisco.com/c/en/us/support/docs/field-notices/637/fn63743.html
On listed Line cards it says: Replace baseboard memory and/or daughter card. Do I always change the baseboard memory and only daughter card if it’s on list?
On the Supervisor720 it says: Replace daughter card but on the daughter card it says: Replace daughter card. The second daughter card is unlisted. What do I replace here?
It tried Tech Support but they will only help me if I suffered an actual failure. I know it’s “Fix-On-Fail-Only” but I might want to pay for the parts myself to avoid prolonged downtime in the future. Who can answer on these more detailed questions?
Thanks
/Tomas
Tomas,
Thank you for your patience as I connected with the team to understand how to respond. Based on that conversation, I believe that the Supervisor720 has memory soldered to the impacted PFC daughter cards. Therefore a PFC replacement would remediate the exposure. When the daughter card is exposed and exhibits the failure, it is the daughter card only that would need to be replaced (not a separate card).
Given the specifics of this request, it may be best to involve your account team and ask them to help support your next steps.
Regards,
Curt
Hi Curtis,
is it necessary to hold an active Cisco service contract to get a replacement part in case of hitting this issue ? Or will cisco replace every failed part that is is showing the symptoms mentioned in the field notices ( and end of sale of the part isn’t reached…)
Thanks you for your information.
John,
Normal SmartNet and warranty entitlement rules remain in place, so we’d ask you to call the Cisco TAC if you experience a failure in one of the products listed in the Field Notices: http://www.cisco.com/web/about/doing_business/memory.html#~field
In other circumstances (eg. out of warranty or out or contract), we’d encourage you to raise your concern directly with your Cisco account team.
Regards,
Curt
Hi Curtis,
If I have a spare router or switch with this issue that is not presently in use and is stored powered off on a shelf, does the memory suffer the same deterioration over time or does the memory need to be powered to deteriorate? Trying to evaluate our exposure verses offline sparing levels.
Thanks,
Pete
Hi Pete,
Thanks for your question. Although the impacted memory components are known to slowly degrade over time, they do need to be powered up for this to occur. I hope this helps with your evaluation.
Regards,
Curt
Interesting. My company is right in the middle of a serious outage because of this, power cycle and the blades fail. Just wanted to express my disappointment that Cisco has decided not to be proactive. I consider this decision a money grab. You know full well that enterprises will be forced to be proactive and thereby purchase new gear versus waiting for it to fail. Especially in DC core switching and routing.
Bill,
I’ve asked our team to confirm that a TAC case is active and that you’re getting every support we can offer. We never want to let a customer down, so please accept my apology for your current situation.
While that effort is underway, I would like to respond to your other comments. With our customers in mind, Cisco made a serious commitment of both people and dollars to address this issue. I would also submit that no other technology vendor has been proactive in the same way. We publicly disclosed the issue and have published field notices for every Cisco product that uses the affected memory components.
Our hope is that this transparency helps us work together on managing issues if or when they arise.
Regards,
Curt
One of my customer had already experienced the breakdown of core switches during scheduled power cycle. It was confirmed during EFA. The same was restored by RMA. Recently one more Memory module issue was reported by same customer. This time in a router. I need communicate this to my customer. Can you please direct me to any official external communication available that can be shared with customers?
Dhanesh,
Thank you for your question. We have provided a lot of public-facing
information on our website at http://www.cisco.com/go/memory. There you will
find an overview of the memory component issue, some frequently asked
questions, and detailed Field Notices. I hope this is useful
information for you and your customer.
Regards,
Curt
Hello again Curt,
In the FAQ I note a statement that the final affected component was removed from Cisco Inventory in 2012. If I have devices listed in the field notices as an affected PIDs, but with a manufacturing date greater than 2012, would they still be affected? Is there an “after this date” manufacturing date where product is known to be good we can go by?
Thanks,
Pete
Welcome back Pete!
I wish I was able to say “everything after this date,” but there could always be a difference between manufacturing and product distribution dates. We also disqualified memory components by product as we became aware of the potential impact, so the dates also differ by individual product. The best advice I can offer is to work with your Cisco account team to arrange an assessment of your inventory for products identified in the Field Notices.
This is a good question and one I will ask my team to include in the general FAQs for customers.
Regards,
Curt