In my Internet of Things keynote at LinuxCon 2014 in Chicago last week, I touched upon a new trend: the rise of a new kind of utility or service model, the so-called IoT specific service provider model, or IoT SP for short.
I had a recent conversation with a team of physicists at the Large Hadron Collider at CERN. I told them they would be surprised to hear the new computer scientist’s talk these days, about Data Gravity. Programmers are notorious for overloading common words, adding connotations galore, messing with meanings entrenched in our natural language.
We all laughed and then the conversation grew deeper:
- Big data is very difficult to move around, it takes energy and time and bandwidth hence expensive. And it is growing exponentially larger at the outer edge, with tens of billions of devices producing it at an ever faster rate, from an ever increasing set of places on our planet and beyond.
- As a consequence of the laws of physics, we know we have an impedance mismatch between the core and the edge, I coined this as the Moore-Nielsen paradigm (described in my talk as well): data gets accumulated at the edges faster than the network can push into the core.
- Therefore big data accumulated at the edge will attract applications (little data or procedural code), so apps will move to data, not the other way around, behaving as if data has “gravity”
Therefore, the notion of a very large centralized cloud that would control the massive rise of data spewing from tens of billions of connected devices is pitched both against the laws of physics and Open Source not to mention the thirst for freedom (no vendor lock-in) and privacy (no data lock-in). The paradigm shifted, we entered the 3rd big wave (after the mainframe decentralization to client-server, which in turn centralized to cloud): the move to a highly decentralized compute model, where the intelligence is shifting to the edge, as apps come to the data, at much larger scale, machine to machine, with little or no human interface or intervention.
The age-old dilemma, do we go vertical (domain specific) or horizontal (application development or management platform) pops up again. The answer has to be based on necessity not fashion, we have to do this well; hence vertical domain knowledge is overriding. With the declining cost of computing, we finally have the technology to move to a much more scalable and empowering model, the new opportunity in our industry, the mega trend.
Very reminiscent of the early 90’s and the beginning of the ISPs era, isn’t it? This time much more vertical with deep domain knowledge: connected energy, connected manufacturing, connected cities, connected cars, connected home, safety and security. These innovation hubs all share something in common: an Open and Interconnected model, made easy by the dramatically lower compute cost and ubiquity in open source, to overcome all barriers of adoption, including the previously weak security or privacy models predicated on a central core. We can divide and conquer, deal with data in motion, differently than we deal with data at rest.
The so-called “wheel of computer science” has completed one revolution, just as its socio-economic observation predicted, the next generation has arrived, ready to help evolve or replace its aging predecessor. Which one, or which vertical will it be first…?
Tags: Big Data, big data analytics, CERN, cloud, Data Gravity, Fog computing, gravity, IoT, IoTSP, ISP, keynote, LHC, Linux, LinuxCon, M2M, Moore’s law, Nielsen's Law, open source, SP
The video4linux subsystem of the kernel which deals with video capture, video output and hardware video codecs has a very large API with many ioctls, settings, options and capabilities. And most hardware will only use a fraction of that. This makes it hard to test whether your driver implements everything it should and it makes it hard to test if your application supports all hardware variants.
Providing tools that allow you gain confidence about the quality of the code you are writing, whether it is a driver or an application, would be very helpful indeed. As co-maintainer of the subsystem and as part of my job trying to convince the industry to switch to the V4L2 API instead of (Oh no! Not again!) rolling your own API I thought this was a worthy cause to spend time on.
I started writing a utility called
v4l2-compliance to test drivers over 6 years ago, but for a long time it only tested a fraction of the V4L2 API. The test coverage slowly increased over the years but it wasn’t until February this year that it became a really powerful tool when support for testing video streaming was added. Today it has test coverage of around 90% of the API and new V4L2 drivers must pass the
v4l2-compliance tests before they are allowed in the kernel.
One important missing piece in the compliance utility is testing for the various cropping, composing and scaling combinations. The main reason being that it wasn’t always clear in the API what the interaction should be between the various actions. E.g. changing a crop rectangle might require a change to the compose rectangle as well. So should that be allowed or should an error be returned instead? (Answer: yes, that’s allowed). I hope to add support for testing this some time this year.
It would be nice if this could be easily tested with an application and a driver that supports all the various combinations. But no such driver exists, and that brings me to the second part of this post: how do you test an application against the bewildering array of hardware? All too often application developers only test their application against the hardware they own, and so it is likely it will fail miserably when using it with hardware that implements a different subset of the V4L2 API.
The answer to this question is that a virtual V4L2 driver is needed that implements as much of the V4L2 API as is possible and that can be configured in various ways to accurately model real hardware. Today there is a virtual video driver in the kernel called
vivi, but unfortunately that driver doesn’t act at all as real hardware does. And it only supports simple video capture which is just a small subset of the whole API.
In order to resolve this situation I wrote a new driver called vivid: Virtual Test Driver. This driver covers most of the V4L2 API and is ideal for testing your application. Writing this driver was very useful since it forced me to think about some of the dark and dusty corners of the V4L2 API, and some of those corners needed a big broom to clean up. I found a variety of bugs in the V4L2 core and the API documentation just because this driver exercised parts of the API that are rarely if ever used.
I also realized that a driver like this is ideal to emulate hardware that is not yet available and can be used to prototype your upcoming product in the absence of the actual hardware. It’s a logical consequence of the requirement that in order for the virtual video driver to be really useful it has to accurately model hardware.
It also had an immediate beneficial effect on the two ‘golden reference’ utilities that control V4L2 drivers: the command line
v4l2-ctl utility and the GUI equivalent
qv4l2. After all, in order to test whether the
vivid driver works you need applications to test the driver. As a result both utilities improved as more features were added to the driver, which all needed to be tested by those applications. So the driver has already fulfilled its promised to help test and improve applications.
All utilities mentioned in this article are part of the v4l-utils git repository.
If you would like to know more about V4L2 driver and application testing, then attend my presentation on this topic during the upcoming LinuxCon North America in Chicago!
Tags: Linux Kernel, LinuxCon North America 2014, Testing, video4linux
I am delighted to announce a new Open Source cybergrant awarded to the Caltech team developing the ANSE project at the Large Hadron Collider. The project team lead by Caltech Professor Harvey Newman will be further developing the world’s fastest data forwarding network with Open Daylight. The LHC experiment is a collaboration of world’s top Universities and research institutions, the network is designed and developed by the California Institute of Technology High Energy Physics department in partnership with CERN and the scientists in search of the Higgs boson, adding new dimensions to the meaning of “big data analytics”, the same project team that basically set most if not all world records in data forwarding speeds over the last decade, and quickly approaching the remarkable 1 Tbps milestone.
Unique in its nature and remarkable in its discovery, the LHC experiment and its search for the elusive particle, the very thing that imparts mass to observable matter, is not only stretching the bleeding edge of physics, but makes the observation that data behaves as if it has gravity too. With the exponential rise in data (2 billion billion bytes per day and growing!), services and applications are drawn to “it”. Moving data around is neither cheap nor trivial. Though advances in network bandwidth are in fact observed to be exponential (Nielsen’s Law), advances in compute are even faster (Moore’s Law), and storage even more. Thus, the impedance mismatch between them, forces us to feel and deal with the rising force of data gravity, a natural consequence of the laws of physics. Since not all data can be moved to the applications nor moved to core nor captured in the cloud, the applications will be drawn to it, a great opportunity for Fog computing, the natural evolution from cloud and into the Internet of Things.
Congratulations to the Caltech physicists, mathematicians and computer scientists working on this exciting project. We look forward to learning from them and their remarkable contribution flowing in Open Source made possible with this cybergrant so that everyone can benefit from it, not just the elusive search for gravity and dark matter. After all, there was a method to the madness of picking such elements for Open Daylight as Hydrogen and Helium. I wander what comes next…
Tags: ANSE, California Institute of Technology, Caltech, CERN, cloud, Data Gravity, Fog computing, Hadron, Hadron Collider, Helium, Higgs boson, Hydrogen, Internet of Things (IoT), IoT, LHC, Open Daylight, open source, opendaylight, physics
Our Cisco colleague Anthony Grieco wrote a quick blog post over on the Cisco Security blog announcing that Cisco is a proud supporter and founder of the Linux Foundation initiative announced on April 24th.
We are pleased to help form a critical mass of governance, funding, and focus that will support the output of open source communities like OpenSSL. By working together as an industry, we can expect greater security, stability, and robustness for components that are critical to the Internet.
Check out the blog article here for further information: http://blogs.cisco.com/security/cisco-linux-foundation-and-openssl/
The Open MPI project released version v1.8 last week. This is a major release that heralds the beginning of a new production-ready series, full MPI-3.0 support, and a new OpenSHMEM implementation.
Open MPI is developed in a tick-tock fashion:
- Odd-numbered series are focused on feature development and expansion
- Even-numbered series are focused on stability and production usage
The even-numbered v1.8.x series therefore represents a new production-ready series that effectively deprecates the prior production-ready series (v1.6.x).
Read More »
Tags: HPC, mpi, Open MPI, OpenSHMEM