Next week at the O’Reilly Open Source Conference (OSCON) in Austin, Texas, I’ll be teaching systems administrators and developers how to throw away their servers each time they want to make a simple change. Sounds ridiculous? That’s what I thought as first too. But ask any experienced sys-admin how many times a “simple, one line change” has taken down a production service, causing loss of service to customers. You’ll might be surprised.
Let’s take a step back and talk about how traditional computer systems infrastructure management looks. You need to stand up a new service, so you procure a server, wait for it to be delivered, set it up with a fresh OS and your configuration, install it in a rack in a data-center, and most likely manage it remotely over SSH. Want to upgrade the software or make a change? SSH in and make the change. Or maybe you’ve figured out that this is not scalable, and are using some sort of configuration management tool like Puppet or Chef or Ansible. But what happens when a deployment goes south? How easy is it for you to rollback? Why did it go bad in the first place anyways – is your test environment low fidelity compared to your production environment? Did you even test the change before “shipping” it?
Virtual machines came along, and so did “the cloud.” They saved us enormous amounts of time and money by reducing the process of provisioning a new server down to the click of a button and a few minutes time. But wait – why are we even clicking a button? These are virtual servers after all – just more lines of code. We had grown so used to the model of treating our servers like they were pets, constantly needing care and attention, and making sure they stay up and running as long as possible – because no one wants to be the one to drive down to the data-center on Saturday night to fix the server that has a faulty power supply. Somehow we did the same thing with our virtual machines – we were doing it all wrong!
Enter disposable infrastructure. Here’s the idea: you build a virtual server with your latest and greatest production code, and put it out there in the wild. Want to make one of those “simple, one line changes?” Build an entire new one. Test it thoroughly. Slowly put it into rotation, like dipping its feet in the water, so that you can always pull it right back out if something wasn’t quite right with it. All of this should be defined in code, as some call “infrastructure as code.” Never touch an instance that is already in production, as some call “immutable infrastructure,” but instead just dispose of it. Write well defined and through tests, all automated through your build system, to reduce the human element as much as possible.
In the course at OSCON, we’ll dive into this idea in a more practical sense, by getting our hands dirty with some of the most popular open source tools for accomplishing this model: Vagrant, Puppet, Packer, and Terraform. We’ll use the tools against AWS, where many attendees already have infrastructure hosted. But most importantly, we’ll all become much more confident about pushing changes to production, which allows us to become more innovative, move fast, and provide much more value to our customers.