Thoughts and Observations: Software Defined Storage
Last week I had the rare pleasure of being able to attend a storage conference (rare in the sense that I usually am one of the speakers, rather than one of the attendees). It was SNIA’s Storage Developer’s Conference, and like most events there were both things that were interesting and worthwhile, and things that left something to be desired.
The lasting impression that I walked away with, however, was something that went beyond any one particular conversation, presentation, or technology. Indeed, the thoughts that have been rattling around in my brain for the past week made me realize that, if we (those of us in the industry) aren’t careful, the future looks extremely convoluted and confusing. At worst we may actually wind up mismatching solutions to problems, taking giant steps backwards, locking us into a perpetual game of ‘catch-up’ as we struggle to accomplish what we can do today using traditional storage methodology and equipment
The Polysemic of Storage
“Polysemic” means, essentially, “having multiple meanings.” and the storage industry is rife with them.
As brilliant as some of the sessions were, or as disappointing as some of them turned out to be, there was one major perspective that permeated all of them: there was only one definition of storage – the speakers’, That is, if you happened to come from a different area of the storage industry, well, you didn’t just not count, you didn’t exist in their world.
Sure, we all have strong biases in whatever technology we choose to work within:
- Storage manufacturers, whether they be in SSDs, Flash, or “Spinning Rust,” (as traditional hard drives are now pejoratively called), all see themselves as working in “storage”
- Similarly, manufacturers of storage networks (including Cisco) tend to take a somewhat more holistic view, but treat the traffic that is passed from host to target as “storage,” which means the network is a critical part of the meaning
- Developers, like those at the conference, tend to work more on the calls to and from disks, and see the usage of that data as “storage,” because it’s where the data is often turned into useful information
- Creators of file systems and access protocols, like the presenters who focus on Hadoop, Ceph, SMB3 or NFS4 variants (yes, I’m deliberately lumping these together to prove a point) define storage as a means of access and mobility
If you are not in the storage industry, you may take a look at all of these perspectives (and these are just four, there are many more terms and definitions that are in use) and see that all of them can be considered a part of “storage,” which necessarily becomes more nebulous and poorly-constrained the more elements are added to the mix.
Inside the storage industry, however, it was amazing just how much each of these groups are isolated from one another. What I mean is that there appears to be a willful ignorance of many very intelligent people about the nature of the entire storage ecosystem. Even in the SMB track, which I thoroughly enjoyed, there was a nonchalance about the effect of running increased SCSI traffic over the network to enable improved clustering capabilities in Hyper-V.
Okay, if that sounds like Greek to you, here’s the bottom line: the developers who are creating these technologies seemed to be unconcerned about the ripple-effect consequences of throwing more data onto a network. In other words, at all times they assumed a reliable, stable network with enough capacity is just going to “be there.”
I think the thing that was very troubling to me was that when asked directly about the networking concerns, the response that came back was that “it didn’t matter, we’re using RoCE (RDMA over Converged Ethernet)” In other words, “hey, once it leaves my server, it’s not my problem.”
I had a very hard time getting my head around this – it seems intuitive (to me, at least) that if you’re going to throw exponentially increasing amounts of data onto a network there may be, uh, consequences. As I’ve noted before, whenever you use a lossless network, like RoCE, you’re going to have to architect for that type of solution.
To these developers, however, it seemed as if they were only caring about what happened inside the server; once a packet left the server it was no longer their problem, and it didn’t exist as their problem until it hit the end storage target.
This is all well and good, until you start getting into the question of Software-Defined Storage.
Software-Defined, or Software-Implemented Storage?
Alan Yoder of Huawei had an excellent observation during a conversation with Lazarus Vekiarides. In a casual discussion regarding SDS, Alan pointed out that much of the characteristics of most SDS wishlists appears to be more of an implementation of storage, rather than a definition of storage. I have to confess, this distinction really spun me for a loop precisely because it underscored the very problem with semantics I was having.
[Update: September 28, 2013: Laz wrote a blog about his thoughts on this as well.]
Why? Because he’s absolutely right. There is a huge difference between creating a system which merely connects the dots and implements a storage system, and one that defines the storage paradigm.
There are management programs which already exist that implement storage and ease the stress of provisioning. UCS Director (née Cloupia), for instance, does this. Software like this allows you to identify a server and storage instance and allows you to identify how much storage you need, how many CPUs to use, and even how long you want the instance to exist. I think (this is outside my wheelhouse, so I admit I’m not as confident as this claim) that there are competitors who have similar types of software solutions as well.
There’s a huge difference between creating a system which merely connects the dots and implements a storage system, and one that defines the storage paradigm
None of these solutions define the storage network, however. There is no (to my knowledge; if I am wrong I will happily correct my assumptions) dynamic observation of the bandwidth of a network and re-allocation of resources based upon usage to alter minimum ETS (Enhanced Transmission Selection, which is what is used to guarantee bandwidth groups in an Ethernet network) capacity.
Why is this important? Because without these types of capabilities this means that you are confining your software to a very, very limited subset of compatible storage protocols and hardware. In this case, you’re relegated to the types of storage solutions that rely on the lowest common denominator of compatibility. In fact, you can go too far: all the high performance of SSDs won’t save you from the layers upon layers of virtualization necessary to make it all work. For many storage applications this means we’ve taken a huge step backwards.
Think of it like this: the game of Football has many meanings, depending upon who you ask. An implementation of football can consist of choosing one of many plays.
However, if you ask someone else about what they consider to be the best plays for football, their idea may look something like this:
You can create a software program to create and implement plays all day long, but it does not define the game of football. To extend the (imperfect) analogy, if Football were Storage, we still do not have any means of being able to define which type of Storage we want to play with, and thus have a clue as to the best way to implement it!
There are a lot of opinions and projects going on in the industry that have the name “Software-Defined Storage” attached to them. We know, for instance, that EMC has its VIPR program (which always makes me think of the TSA, sadly), and VMware has dipped its toe into (the malappropiately named) VSAN (which is more of a virtual DAS system, but that’s a topic for another day). The OpenStack project is working on including block storage (e.g., Fibre Channel) into its Cinder release, and of course SNIA’s Storage Developer’s Conference had numerous vendors who claimed to have solutions that fit into this “software-defined storage” category as well (e.g., Cloudbyte, Cleversafe, Jeda Networks).
In fact, it seems like nearly everyone and anyone has a “software-defined storage” offering that “prevents vendor lock-in” as DataCore puts it. [September 26, 2013 Update: In the four days since I first posted this blog, vendors EMC, Nexenta, Gridstore, RedHat, HP, VMware, Sanbolic and analysts Howard Marks, 451 Research, have all been running a full-court press on the subject. I wouldn’t hold my breath that they’re all referring to the same thing, though. Someone with more time than I have should probably create some sort of compatibility matrix to figure out just how much of the Venn diagram is covered.]
It seems to me, though, that each of these solutions fails to actually accomplish what they claim to do – define the storage solution holistically. In other words, they either are taking one of the small subsets of storage discussed above and embellishing it, or attempting to implement a better mousetrap for storage access and mobility. Or, to continue the football analogy, many solutions choose a game and then try to figure out how to implement the plays.
None of this should be considered a bad thing, mind you! I’m a big proponent of trying to make storage solutions more coherent and approachable.
I suppose I get concerned, however, when I see that the solutions appear to root us more firmly in any one particular paradigm and lock us out of solutions that hinder us from the flexibility of performance and breadth of storage options that exist today. Isn’t that, after all, what “software-defined” is supposed to liberate us from?
This is the part where I make the standard disclaimer – these are my thoughts and mine alone, so you should judge them as my mere observations rather than any ‘official’ claim or direction from Cisco or my colleagues. Your mileage may vary, all rights reserved, no animals were harmed in the making of this blog. Don’t read anything strategic into this – these are just my own feverish musings.
In my warped little mind I see a “Software-Defined Storage” environment as one that is longitudinal and holistic. That is to say, the software dynamically adapts to storage characteristics over time, from server through the network to the storage targets. This would mean the definition of each OSI layer to be able to handle optimized storage solutions as per application requirements.
Here’s a pie-in-the-sky example: Data centers have multiple applications that are being run on a variety of storage protocols. Some of these applications require server-to-server communication (such as the SCSI over SMB I described above), some require dedicated bandwidth capacity for low-latency, low-oversubscription connectivity (such as database applications), and some require lower-priority, best-effort availability (such as NFS-mounts).
From a systems and networking perspective, we’re talking about the entire OSI stack, from the physical all the way up to application layer. Each of these have their own best practices and configuration requirements. A software-defined storage solution should be able to access each of these layers and modify the system-as-a-whole to get the best possible performance, rather than shoe-horning all applications into, say, SCSI over SMB because it’s “easiest at layer 4” and most applicable for Hyper-V clustering. (That’s just an example).
Software-Defined Storage implies a Data Center that can change the definition of storage relationships over time, not just tweak attributes
Imagine, though, if you could initiate an application in such a way that the software monitors the data over time and recognizes capacity and bandwidth hotspots, evaluating latency, oversubscription ratios, ETS QoS, and even protocol/layer inefficiencies (like running lossless iSCSI when TCP/IP over a lossy network would actually be more efficient). Take the advances in Flash, SSDs and server-side caching as well as the virtualized storage solutions already well-proven, and provide longitudinal metrics and re-evaluation.
Need more capacity for your mission-critical applications? A finely-tuned SDS would dynamically allocate additional guaranteed bandwidth at Layer 2. Network topologies creating unacceptable iSCSI performance? Change from lossy to lossless, or vice versa. Have a choice between various types of block protocols (e.g., iSCSI, FCoE, Object), have the system do an impact analysis and choose the best definition of that storage solution.
(Ethernet, of course, provides a great basis for this. It’s one of the few media that allow for such diverse modification and alteration for so many applications. There really is no other medium that can allow you to move from lossy to lossless and vice-versa. Just an observation, of course.)
Of course, If this scares you, perhaps “software defined storage” may not be the right way to go. 🙂 There is absolutely nothing wrong with sticking with tried-and-true.
I admit that there is more “science fiction” than “data center fact” in what I’ve described. That would be a fair criticism, I believe. However, I think it’s also a fair criticism to make the observation that any proponent of “software-defined storage” that glosses over the entire storage ecosystem, or relies heavily on the silos within which they work as the sole component for SDS breakthroughs, is limited in scope. Either that, or my imagination is far too unlimited. 🙂
As I said, I have no special knowledge or grand design into Software-Defined anything, and I’m still in a “sponge-mode” learning zone, so it would be foolhardy to read too deep into my musings. I do think, however, that this is a subject that deserves open and clear discussion so as to avoid the accusation that “Emperor SDS has no clothes.” I also think it’s the only way If we are to avoid the “buzzword bingo” problem when it comes to storage.
Personally, I think that would be a shame if it wound up that way. The storage industry as a whole – as an adaptive ecosystem – has a real opportunity to do some interesting things and create a true paradigm shift in the way it all works. Storage administrators and consumers are not the most adventurous bunch, of course, but that doesn’t mean that we can afford to think of reliable access to data the same way over the next ten or twenty years.
Ultimately, I think the answer lies in thinking bigger and more expansive, with the true big picture in mind. We miss a unique and potentially revolutionary opportunity if we merely attempt to create a Franken-storage system out of patchwork software implementation silos that think the rest of the ecosystem is “all things created equal.”
Nevertheless, I believe the topic deserves better consideration than it’s been given. I welcome and encourage all thoughts, ruminations, pontifications, elaborations, meditations, and reflections. In other words, tell me what you think. 🙂