Cisco Blogs

Offshore Oil Rig Accident Lessons Spill into IT

- May 26, 2010 - 4 Comments

As the environmental disaster resulting from the British Petroleum (BP) oil spill unfolds in the Gulf of Mexico, I am reminded of common problems in the information security industry. Granted, the scale and potential impact of the oil accident—with thousands of barrels of crude spewing out of a hole on the ocean floor a mile deep each day, endangering sensitive coastline ecosystems from Texas to the Florida panhandle—are thankfully hard to duplicate in the world of technology. A bad patch release or a data breach can cost a company millions of dollars and engineers their jobs, but in most cases, lives are not at risk. Lack of “circuit breakers” regulating stock exchange trading may lead to the evaporation of close to a trillion dollars in minutes, but trading can be halted, plugs can be pulled. Not so with this oil spill, where despite BP’s best efforts to contain the disaster, extensive and probably long-lasting damage to the Gulf Coast and losses to the fishing and tourism industries appear unavoidable.

Reading over Congressional testimony and media analysis of the spill, it appears that several mistakes were made that have analogues in the world of technology. They were mistakes that highly-trained, intelligent people who should know better, make. They should look familiar to us in the high-tech world. Here are five:

1. Just because you can doesn’t mean you should.

In fast-paced, competitive technical fields, engineers may feel pressure to produce novel solutions on tight schedules. The Deepwater Horizon disaster demonstrates the extent to which global demand for oil has pushed energy companies to drill in increasingly dangerous, difficult environments, using ever more advanced and unproven technologies. It is not news that problems in a system increase dramatically with the complexity of the system. Moreover, inability—through time, physical, or financial limitations—to fully test a complex system before deploying it is probably a recipe for trouble. The principle of Occam’s Razor—where the simplest solution is generally the best one—may be a rule of thumb to remember when designing solutions particularly for mission critical systems.

2. Two is one and one is none.

This is a common sense saying that a wilderness survival enthusiast once told me. Bring more than one knife, and more water than you think you will need. Know more than one way out of your hotel, have more than one route to get to work. The oil rig designers are probably now regretting that they did not build more robust redundancy into their blowout prevention systems. IT professionals are unlikely to later regret backing up data more frequently than absolutely necessary and storing it in more than one physical location. When creating business resiliency plans, it is common sense to have more than one solution to the most likely problems, and for critical systems, to have key tools on hand and ready to deploy.

3. Short cuts can get you into deep water.

Reports are emerging that suggest safety corners were cut on the oil rig leading up to the accident. The blowout preventer had been modified, emergency cut-off valves had leaky hydraulics, and at least one had a dead battery, according to several reports. Engineers on tight deadlines may be tempted to take short cuts, bend rules, or downplay known problems in order to get the job done. The enormous expense and productivity loss involved in taking down working systems for crisis testing may give planners reason to delay or rationalize. This may be particularly true when the economy is uncertain and workers feel insecure about their jobs. In the case of the oil rig disaster, technicians apparently ignored conflicting pipe pressure test results, which indicated a problem. Peer review, objective oversight, and other time-honored best practices may be helpful in avoiding these traps.

4. Accident containment can keep a bad problem from becoming a disaster.

In complex operations, mistakes will be made, accidents will happen. In fact, there are entire theoretical schools built around so-called system accidents. In the case of the oil rig disaster, engineers had taken basic safety precautions, drawn up disaster plans, and installed backup systems, but in the event of the low probability high impact scenario—when the wound started gushing blood—there was no tourniquet on hand. Risk models relied on the blowout preventer functioning effectively. A month later, the well is still gushing oil into the Gulf of Mexico.

5. All those security precautions are there for a reason.

Experience shows that, in sophisticated systems, various elements interact with each other in complex and often unpredictable ways. Sadly, many of the safety devices we see every day—smoke detectors, seat belts, brake lights—were only standardized after hard experience proved their necessity. As a dangerous bubble of methane burped from the ocean bottom toward BP’s oil rig, a succession of security devices including a blowout preventer, cement plugs, and a huge wall of mud, failed successively to hold it back. In a situation where a cascading system failure brings down an operation, there may be plenty of time to blame regulators for lack of oversight after the fact, but the anvil will fall hardest on the person with his hand on the switch.

Leave a comment

We'd love to hear from you! To earn points and badges for participating in the conversation, join Cisco Social Rewards. Your comment(s) will appear instantly on the live site. Spam, promotional and derogatory comments will be removed.

All comments in this blog are held for moderation. Your comment will not display until it has been approved

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.


  1. Our entire development system encourages Success-oriented planning"", with a small fudge factor (maybe 10%) to allow for problems. The system does not reward people who talk of worst-case scenarios. There seems to be no limit to worst-case scenarios, it's always possible to think of a worse one.Even a turtle cannot make progress unless he sticks his neck out.The analogies in the essay, linking the Deepwater Horizon spill to financial and business disasters, should make managers pause to consider worst-case scenarios, even if not all can be defended against."

  2. The recent events regarding the oil spill remind me of the countless discussions surrounding disaster recover and continuity planning. In a world as vast as the one we live, every organization should have established (AND TESTED) policies, processes, and procedures that take the smallest to worst case scenarios into a account and provide adequate solutions to resolve said scenarios should they ever arise. Today we have audits and hefty sanctions to contend with should certain processes, procedures, and frameworks not be in place, yet much of society seems to them for granted. That is until an event such as the oil spill brings us back to reality, and thus keeps us grounded in the importance of these various constructs. If nothing else these recent events should remind us all of the social responsibility we have to not only our partners, consumers, and shareholders, but also the world around us.Thanks,

  3. Jean,Thank you for your enjoyable and insightful post; you make many good points about safety. The similarities in circumstances between industries are much greater than is often assumed.The IT industry is, in my humble opinion, among the poorest performers in this area. By contrast, a tremendous amount can be learnt from the aviation industry.Although the subject area is wide and multi-faceted, there are common elements which contribute to performance.For example, it is important that there are accredited personnel in the operational environment who are responsible for adherence to standards. Whether they are identifies as safety officers, instructors or in other roles, the essential characteristic is that, irrespective of their employment relationships, they value their standards accreditation more highly than their current job.As another example, it is not sufficient to rely on testing, it is also important to carry out inspection. This is an area where the IT industry in particularly is very weak.Thanks again for addressing a hugely important area.Safety is no accident.""John W Lewis"

  4. I worked 16 years in the US Air Force Air Materiel Command in Configuration Management of software. We went to great lengths to define users' requirements and then to write them into specifications for software development. We spent millions of dollars and created volumes of documentation.Now that I have retired, I wonder if my work actually did any good. The users frequently failed to specify key functions, and the users' tasks changed due to outside forces, requiring major rewrites and changes leading to cost overruns.High authorities usually required integration of complex systems, and those complex systems were themselves evolving with new requirements and new threats. The result was long development times and big cost overruns.During the Yom Kippur War, a new Russian missile had a tracking beam which we had not known about. We picked up the signals during the battle and relayed the data to our developers at Eglin AFB, Florida. In less than 48 hours a software change was made, tested, and sent to Israel to update electronic jamming pods. I heard that the change was 100% effective in defeating the new missile. I highly doubt that change procedures were followed.When I look at successful software development, it seems to be done by relatively small groups to solve specific problems. These subsystems are integrated with larger systems by rigorous testing. Take Apple's iPhone, for example. The users did not develop the specifications -- the ideas came from people who had imagination and who knew how things should work. Then they set up developers' toolkits and integration specifications for thousands of developers to create a huge new market for apps which did not exist before. The apps are apparently tested rigorously because they seem to work most of the time and don't crash the whole system when they fail.In software, then, imagination and speed of development help put a product on the street before the users' needs change. Flexibility to adapt to changing needs or changing competition is more important than finalizing and controlling details of written specifications.Taking this back to worst-case scenarios: the gulf oil spill occurred because the blow-out device failed. Apparently it had not been tested adequately, and even had at least one dead battery. The worst-case scenario - a blowout -had been imagined correctly -- but the safety device had not been adequately tested. For software development, the solution lies not in more paperwork, but in better imagination of what could go wrong, and then testing the fix -- and imagining what further steps to take if that blowout preventer fails.