If you were paying attention to the Intertubes or Twitterverse today, you probably heard about an issue at one of the well-known Cloud Computing providers. Needless to say, fingers were being pointed left and right, and all the “experts” came out to explain their 20/20 hindsight into causes (still unknown) and avoidance.
I purposefully avoided any comments about these events because sometimes in life systems go down. If you’ve been in the technology industry long enough, and actually worked in support or operations, you know that even the best designs can have issues. And I’m not ashamed to say that I’ve been the cause of some (temporary) issues with large customer systems. When it happens, it’s not a good day for anyone involved -- the operators, their customers, the fat-finger typer or wrong-cable puller, etc.
What dawned on me throughout the day were all the people labeling this #FAIL. This is the Internet’s new meme anytime something goes slightly different than plan.
- Food is slightly overcooked -- #FAIL.
- Plane flight took off a few minutes late -- #FAIL.
- Software upgrade took three clicks instead of two -- #FAIL.
This got me thinking about the best advice I’ve ever gotten from a mentor, early in my career. We were talking about the characteristics of the company’s management team and the mentor told me, “The thing that they all share is that they were HUGE failures at one point in their career. And that experience is what’s made them better and the leaders they are today.” Yes, that’s right, back in the day people regarded failure as part of the experiences of life. An activity that if dealt with properly could create a learning experience and an opportunity to get better
But for many people today, it was all about #FAIL and the need to create new fears for people considering (or actually deployed upon) Cloud Computing. It’s unfortunate because I’d be willing to bet that anyone effected by today’s issues will have learned an incredible lesson and their systems will be better going forward. In fact, one of the greatest characteristics of Cloud Computing (public or private) is the ability to Fail Fast.
When I talk to customers about Cloud Computing, one of the first things we typically talk about is the reality that they will probably (eventually) deliver IT services for their business from a variety of Cloud sources -- Private, Public, Commodity, Community. What we try and help them understand is that the principles they consider for their internal systems -- Availability, Security, Mobility -- need to also be considered for external systems. There aren’t any short-cuts, but there are some ways to apply best practices from across the community (for public and private usage) and to actually get better at failing faster.
The headlines for the next couple of days may look back at today’s events and try to highlight the #FAIL, but strong leaders and smart engineers will find ways to turn those failures into learning for the next steps in the Journey.
P.S. -- Kudos to the other Cloud Computing providers in the industry for taking the high-road and not slinging mud at this difficult situation. That showed class. I’m glad to see there was a brotherhood of people that recognize that operating the world’s largest systems is not easy.
Tags: Cloud Computing