In my Internet of Things keynote at LinuxCon 2014 in Chicago last week, I touched upon a new trend: the rise of a new kind of utility or service model, the so-called IoT specific service provider model, or IoT SP for short.
I had a recent conversation with a team of physicists at the Large Hadron Collider at CERN. I told them they would be surprised to hear the new computer scientist’s talk these days, about Data Gravity. Programmers are notorious for overloading common words, adding connotations galore, messing with meanings entrenched in our natural language.
We all laughed and then the conversation grew deeper:
- Big data is very difficult to move around, it takes energy and time and bandwidth hence expensive. And it is growing exponentially larger at the outer edge, with tens of billions of devices producing it at an ever faster rate, from an ever increasing set of places on our planet and beyond.
- As a consequence of the laws of physics, we know we have an impedance mismatch between the core and the edge, I coined this as the Moore-Nielsen paradigm (described in my talk as well): data gets accumulated at the edges faster than the network can push into the core.
- Therefore big data accumulated at the edge will attract applications (little data or procedural code), so apps will move to data, not the other way around, behaving as if data has “gravity”
Therefore, the notion of a very large centralized cloud that would control the massive rise of data spewing from tens of billions of connected devices is pitched both against the laws of physics and Open Source not to mention the thirst for freedom (no vendor lock-in) and privacy (no data lock-in). The paradigm shifted, we entered the 3rd big wave (after the mainframe decentralization to client-server, which in turn centralized to cloud): the move to a highly decentralized compute model, where the intelligence is shifting to the edge, as apps come to the data, at much larger scale, machine to machine, with little or no human interface or intervention.
The age-old dilemma, do we go vertical (domain specific) or horizontal (application development or management platform) pops up again. The answer has to be based on necessity not fashion, we have to do this well; hence vertical domain knowledge is overriding. With the declining cost of computing, we finally have the technology to move to a much more scalable and empowering model, the new opportunity in our industry, the mega trend.
Very reminiscent of the early 90’s and the beginning of the ISPs era, isn’t it? This time much more vertical with deep domain knowledge: connected energy, connected manufacturing, connected cities, connected cars, connected home, safety and security. These innovation hubs all share something in common: an Open and Interconnected model, made easy by the dramatically lower compute cost and ubiquity in open source, to overcome all barriers of adoption, including the previously weak security or privacy models predicated on a central core. We can divide and conquer, deal with data in motion, differently than we deal with data at rest.
The so-called “wheel of computer science” has completed one revolution, just as its socio-economic observation predicted, the next generation has arrived, ready to help evolve or replace its aging predecessor. Which one, or which vertical will it be first…?
Tags: Big Data, big data analytics, CERN, cloud, Data Gravity, Fog computing, gravity, IoT, IoTSP, ISP, keynote, LHC, Linux, LinuxCon, M2M, Moore’s law, Nielsen's Law, open source, SP
I am delighted to announce a new Open Source cybergrant awarded to the Caltech team developing the ANSE project at the Large Hadron Collider. The project team lead by Caltech Professor Harvey Newman will be further developing the world’s fastest data forwarding network with Open Daylight. The LHC experiment is a collaboration of world’s top Universities and research institutions, the network is designed and developed by the California Institute of Technology High Energy Physics department in partnership with CERN and the scientists in search of the Higgs boson, adding new dimensions to the meaning of “big data analytics”, the same project team that basically set most if not all world records in data forwarding speeds over the last decade, and quickly approaching the remarkable 1 Tbps milestone.
Unique in its nature and remarkable in its discovery, the LHC experiment and its search for the elusive particle, the very thing that imparts mass to observable matter, is not only stretching the bleeding edge of physics, but makes the observation that data behaves as if it has gravity too. With the exponential rise in data (2 billion billion bytes per day and growing!), services and applications are drawn to “it”. Moving data around is neither cheap nor trivial. Though advances in network bandwidth are in fact observed to be exponential (Nielsen’s Law), advances in compute are even faster (Moore’s Law), and storage even more. Thus, the impedance mismatch between them, forces us to feel and deal with the rising force of data gravity, a natural consequence of the laws of physics. Since not all data can be moved to the applications nor moved to core nor captured in the cloud, the applications will be drawn to it, a great opportunity for Fog computing, the natural evolution from cloud and into the Internet of Things.
Congratulations to the Caltech physicists, mathematicians and computer scientists working on this exciting project. We look forward to learning from them and their remarkable contribution flowing in Open Source made possible with this cybergrant so that everyone can benefit from it, not just the elusive search for gravity and dark matter. After all, there was a method to the madness of picking such elements for Open Daylight as Hydrogen and Helium. I wander what comes next…
Tags: ANSE, California Institute of Technology, Caltech, CERN, cloud, Data Gravity, Fog computing, Hadron, Hadron Collider, Helium, Higgs boson, Hydrogen, Internet of Things (IoT), IoT, LHC, Open Daylight, open source, opendaylight, physics
A consequence of the Moore Nielsen prediction is the phenomenon known as Data Gravity: big data is hard to move around, much easier for the smaller applications to come to it. Consider this: it took mankind over 2000 years to produce 2 Exabytes (2×1018 bytes) of data until 2012; now we produce this much in a day! The rate will go up from here. With data production far exceeding the capacity of the Network, particularly at the Edge, there is only one way to cope, which I call the three mega trends in networking and (big) data in Cloud computing scaled to IoT, or as some say, Fog computing:
- Dramatic growth in the applications specialized and optimized for analytics at the Edge: Big Data is hard to move around (data gravity), cannot move data fast enough to the analytics, therefore we need to move the analytics to the data. This will cause a dramatic growth in applications, specialized and optimized for analytics at the edge. Yes, our devices have gotten smarter, yes P2P traffic has become largest portion of Internet traffic, and yes M2M has arrived as the Internet of Things, there is no way to make progress but making the devices smarter, safer and, of course, better connected.
- Dramatic growth in the computational complexity to ETL (extract-transform-load) essential data from the Edge to be data-warehoused at the Core: Currently most open standards and open source efforts are buying us some time to squeeze as much information in as little time as possible via limited connection paths to billions of devices and soon enough we will realize there is a much more pragmatic approach to all of this. A jet engine produces more than 20 Terabytes of data for an hour of flight. Imagine what computational complexity we already have that boils that down to routing and maintenance decisions in such complex machines. Imagine the consequences of ignoring such capability, which can already be made available at rather trivial costs.
- The drive to instrument the data to be “open” rather than “closed”, with all the information we create, and all of its associated ownership and security concerns addressed: Open Data challenges have already surfaced, there comes a time when we begin to realize that an Open Data interface and guarantees about its availability and privacy need to be made and enforced. This is what drives the essential tie today between Public, Private and Hybrid cloud adoption (nearly one third each) and with the ever-growing amount of data at the Edge, the issue of who “owns” it and how is access “controlled” to it, become ever more relevant and important. At the end of the day, the producer/owner of the data must be in charge of its destiny, not some gatekeeper or web farm. This should not be any different that the very same rules that govern open source or open standards.
Last week I addressed these topics at the IEEE Cloud event at Boston University with wonderful colleagues from BU, Cambridge, Carnegie Mellon, MIT, Stanford and other researchers, plus of course, industry colleagues and all the popular, commercial web farms today. I was pleasantly surprised to see not just that the first two are top-of-mind already, but that the third one has emerged and is actually recognized. We have just started to sense the importance of this third wave, with huge implications in Cloud compute. My thanks to Azer Bestavros and Orran Krieger (Boston University), Mahadev Satyanarayanan (Carnegie Mellon University) and Michael Stonebraker (MIT) for the outstanding drive and leadership in addressing these challenges. I found Project Olive intriguing. We are happy to co-sponsor the BU Public Cloud Project, and most importantly, as we just wrapped up EclipseCon 2014 this week, very happy to see we are already walking the talk with Project Krikkit in Eclipse M2M. I made a personal prediction last week: just as most Cloud turned out to be Open Source, IoT software will all be Open Source. Eventually. The hard part is the Data, or should I say, Data Gravity…
Tags: Big Data, core, Data Gravity, Eclipse, edge, Enescu, ETL, Fog computing, IEEE, internet of things, IoT, krikkit, M2M, Moore, Nielsen, Open data, open source, virtualization