Cisco Blogs
Share

It’s a Complicated World Out There – Thoughts on the Amazon S3 Outage

- March 3, 2017 - 8 Comments

It’s a complicated world out there, especially when it comes to cloud.  Customers are trying to figure out the right cloud strategy, and it isn’t easy. The Amazon S3 outage is an example of just how complex the situation is. While that interruption only lasted four hours, the impacts were felt far and wide.

So is public cloud still a viable option? Or course it is. Amazon Web Services published their post-mortem and the issue was human error and fully correctable. Despite this, there is no disputing the reliability of S3 which is on-par or even better than most enterprise storage options. It’s just that since AWS is so large and so pervasive, it makes the news when it fails and when that failure happens, we see multiple systems that all rely on S3 failing at once.

After I heard about it, my first thought was that this is a perfect example of why it is so important for customers to think through all aspects of their cloud strategy. While there are immense benefits to having public cloud as part of your strategy, there are also challenges that need to be identified, planned for, and managed.

Today, we’re talking about Amazon S3.  But that is just one of the multiple clouds organizations are working with today.  In fact, according to IDC, 84% of the leading cloud adopters expect to use multiple clouds from multiple cloud providers.[1]  This is why I’ve been talking a lot about how customers need a strategy for a multicloud world. Working with multiple clouds helps customers take advantage of the unique capabilities of each cloud, making it possible to accelerate your business, enable digitization, and provide developer productivity. An additional benefit is that when you leverage multiple cloud providers, you build some diversity into your infrastructure which means fewer correlated failures.

The benefits of a multicloud strategy are clear and despite this week’s incident, public cloud provides important benefits and needs to be part of the mix. But this shines a light on the importance of balancing agility and performance with cost and risk both on and off premise.  In other words, what levels of risk and cost are acceptable for your desired agility and performance when it comes to cloud services?  All cloud services come with some degree of risk, even Amazon S3, and we accept it because the alternative is often impractical or not financially viable.  As Lydia Leong, cloud analyst at Gartner, put it, “Only the most paranoid, and very large companies, distribute their files across not just AWS but also Microsoft and Google, and replicate them geographically across regions  —  but that’s very, very expensive.” (Source: USA Today)

So, what to do? Whether or not customers could’ve limited the impact of the human error at the root of this particular incident, you shouldn’t stop from planning for the situations you can influence and even manage, both on and off premise.  Let me give you some examples:

  • Use application management and orchestration to help you rapidly redeploy applications when your code or cloud platform is encountering issues
  • Use a multicloud security approach to help address the exponential expansion of the attack surface and minimize the impact of incidents like the DDoS attack against Dyn DNS last October
  • Use both infrastructure and application analytics to help you with telemetry, segmentation, and insights to proactively protect applications and the customer experience
  • Use virtualization technologies in networking for the cloud to reduce the cost of accessing public cloud resources and increase your agility with rapid deployment of new services, without costly upgrades

At Cisco, we think the right answer is have this level of cloud intelligence across your entire multicloud environment (both on AND off premise) to strike the right balance for your organization between control and innovation.

So, tell us how the Amazon S3 outage impacts how you think about your own cloud strategy?

[1] IDC InfoBrief, sponsored by Cisco, Cloud Going Mainstream. All Are Trying, Some Are Benefiting; Few Are Maximizing Value. September 2016.

Tags:

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.

8 Comments

  1. Excellent post. I used to be checking constantly this blog and I am impressed! Extremely useful information specifically the remaining phase :) I maintain such info a lot. I used to be seeking this certain information for a very lengthy time. Thank you and best of luck. maglie calcio

  2. Highlights the need for a redundant multi-cloud or hybrid private-public architecture with burst failover in case of an AWS like outage. Many employed poor discipline in counting on the 99% reliability at US-EAST-1. For most makes poor economic sense to double infrastructure spend to address a 1%. For those that it does - they should have alternate architectures. This 4-hour outage should be a wake-up call as it means Amazon has 83 more outage hours to go before they get to 1%.

  3. Hi Kip, Having worked on hybrid cloud @ Cisco, I certainly appreciate the value of such model. That said, the issue on S3 doesn't necessarily serve as justification for hybrid cloud, neither does many public cloud outages. If anything it boasts the argument for fault tolerance across not just zones but also regions, which was available but only for subset of apps. It may also further advocate the need for additional safeguards against commands with far reaching scope! regards, -Hicham.

  4. Mr. Compton, How does one balance use of proprietary public cloud application usage against the multiple cloud strategy. Do you have any recommended methodology for mitigating such risk?

  5. Would it be wise and or costly to have both cloud and on premise infrastructure? This way mitigating risk if either one goes down.

      In a multicloud environment that is one way to mitigate risk.

    My customer reported that their Meraki Dashboard was unavailable during the Amazon S3 outage, is there any correlation?

      Hi Jeffrey. Please reach out to your Meraki contact directly to discuss your experience in more detail.

Share