In 1984, John Gage of Sun Microsystems coined the phrase “the network is the computer” as computing functions started to become increasingly distributed across the network. Today, boundaries that once separated individual computers have disappeared and application processing is enabled—and managed—by the network. We are now at the forefront of a new market transition, as eloquently explained by Rick van der Lans in his paper, “The Network Is the Database.”
The network is indeed becoming the database. Big Data and the related approach to database management are moving away from a centralized data warehouse model and literally starting to flow across the network. We are virtualizing data management by leaving data in the network, instead of copying it into a data center. Data stays in motion wherever and whenever it’s needed across the network, instead of being at rest.
What does this mean for business value? A distributed—and virtualized—data management approach solves the three major issues of Big Data: volume, variety, and velocity.
In this week’s episode of Engineers Unplugged, Floris Grandvarlet (Cisco) and Richard Pilling (Intel) take on Big Data across the proverbial pond, at Cisco Live Milan. Where are we now, how are we going to approach the ever increasing amount of data (an ocean of it) to fish for information? This is a great overview for the challenges and the evolution of approaches.
Let’s watch and see what they propose to address the challenges:
It’s our very first seahorse--outsmarted once more.
**The next Engineers Unplugged shoot is at EMC World, Las Vegas, May 2014! Contact me now to become internet famous.**
To cross a busy intersection safely, it’s best to have all of your senses alert. That way, if you don’t happen to see that oncoming truck ignoring the “Walk” sign, you will probably still hear it. In the case of a heavy cement mixer, you may even feel the low rumble of its powerful engine first.
In the Internet of Everything (IoE), a similar principle applies. We call it “sensor fusion,” and it involves combining two or more sensors — often of different types — to monitor a specific environment and offer actionable insights more intelligently. These could be cameras and Wi-Fi tags or weight-sensing shelves and ultrasonic imaging, to name just two combinations. Moreover, the combined sensor data can itself be fused with other information streams — for example, those relating to weather, operations, news, or social media.
The result? Highly informed, real-time decision making and richer customer experiences.
Until recently, sensor fusion has been mostly exploited in specialized devices such as robots, but it is now driving a revolution in enterprise systems. This will bring new life to entire industries and completely transform stores, manufacturing floors, and transportation corridors. By greatly improving the accuracy of their measurements, organizations will be able to offer rich new experiences and gain substantial competitive advantage.
What does an already innovative company like Cisco do more to innovate? What do we need to do differently to influence or shape the next breakthrough that will fundamentally change our industry and Cisco? As we embark on a journey to transform Cisco into a #1 IT solution provider, we know we must innovate more and faster – and spot the next industry-shaping change before it catches our industry off-guard.
We believe one of the key strategies for reinventing innovation at Cisco is to embrace openness. Open innovation is a concept developed and evangelized by leading organizational experts, including Dr. Henry Chesbrough, the Executive Director of the Program in Open Innovation at UC Berkeley. It focuses on how organizations can and should use external ideas as well as internal ideas – and internal and external paths to market1. Open innovation enables us to stay abreast of and shape the next big change that is going to impact Cisco and our industry.
A consequence of the Moore Nielsen prediction is the phenomenon known as Data Gravity: big data is hard to move around, much easier for the smaller applications to come to it. Consider this: it took mankind over 2000 years to produce 2 Exabytes (2x1018 bytes) of data until 2012; now we produce this much in a day! The rate will go up from here. With data production far exceeding the capacity of the Network, particularly at the Edge, there is only one way to cope, which I call the three mega trends in networking and (big) data in Cloud computing scaled to IoT, or as some say, Fog computing:
Dramatic growth in the applications specialized and optimized for analytics at the Edge: Big Data is hard to move around (data gravity), cannot move data fast enough to the analytics, therefore we need to move the analytics to the data. This will cause a dramatic growth in applications, specialized and optimized for analytics at the edge. Yes, our devices have gotten smarter, yes P2P traffic has become largest portion of Internet traffic, and yes M2M has arrived as the Internet of Things, there is no way to make progress but making the devices smarter, safer and, of course, better connected.
Dramatic growth in the computational complexity to ETL (extract-transform-load) essential data from the Edge to be data-warehoused at the Core: Currently most open standards and open source efforts are buying us some time to squeeze as much information in as little time as possible via limited connection paths to billions of devices and soon enough we will realize there is a much more pragmatic approach to all of this. A jet engine produces more than 20 Terabytes of data for an hour of flight. Imagine what computational complexity we already have that boils that down to routing and maintenance decisions in such complex machines. Imagine the consequences of ignoring such capability, which can already be made available at rather trivial costs.
The drive to instrument the data to be “open” rather than “closed”, with all the information we create, and all of its associated ownership and security concerns addressed: Open Data challenges have already surfaced, there comes a time when we begin to realize that an Open Data interface and guarantees about its availability and privacy need to be made and enforced. This is what drives the essential tie today between Public, Private and Hybrid cloud adoption (nearly one third each) and with the ever-growing amount of data at the Edge, the issue of who “owns” it and how is access “controlled” to it, become ever more relevant and important. At the end of the day, the producer/owner of the data must be in charge of its destiny, not some gatekeeper or web farm. This should not be any different that the very same rules that govern open source or open standards.
Last week I addressed these topics at the IEEE Cloud event at Boston University with wonderful colleagues from BU, Cambridge, Carnegie Mellon, MIT, Stanford and other researchers, plus of course, industry colleagues and all the popular, commercial web farms today. I was pleasantly surprised to see not just that the first two are top-of-mind already, but that the third one has emerged and is actually recognized. We have just started to sense the importance of this third wave, with huge implications in Cloud compute. My thanks to Azer Bestavros and Orran Krieger (Boston University), Mahadev Satyanarayanan (Carnegie Mellon University) and Michael Stonebraker (MIT) for the outstanding drive and leadership in addressing these challenges. I found Project Olive intriguing. We are happy to co-sponsor the BU Public Cloud Project, and most importantly, as we just wrapped up EclipseCon 2014 this week, very happy to see we are already walking the talk with Project Krikkit in Eclipse M2M. I made a personal prediction last week: just as most Cloud turned out to be Open Source, IoT software will all be Open Source. Eventually. The hard part is the Data, or should I say, Data Gravity…