Analytics at the Edge: Where the Network Becomes the Database
In 1984, John Gage of Sun Microsystems coined the phrase “the network is the computer” as computing functions started to become increasingly distributed across the network. Today, boundaries that once separated individual computers have disappeared and application processing is enabled—and managed—by the network. We are now at the forefront of a new market transition, as eloquently explained by Rick van der Lans in his paper, “The Network Is the Database.”
The network is indeed becoming the database. Big Data and the related approach to database management are moving away from a centralized data warehouse model and literally starting to flow across the network. We are virtualizing data management by leaving data in the network, instead of copying it into a data center. Data stays in motion wherever and whenever it’s needed across the network, instead of being at rest.
What does this mean for business value? A distributed—and virtualized—data management approach solves the three major issues of Big Data: volume, variety, and velocity.
Dealing with the “3 Vs” of Big Data
The volume of data being created today is staggering. In fact, more data is now being created every 10 minutes than was generated throughout all of human history up to 2008. We transport 94 percent of the data collected to data warehouses, but we actually use only about 4 percent of it. For example, we may sense, collect, and store the temperature at a particular city location every minute, when all we actually need to do is collect the data when the temperature changes.
The increasing variety of data also complicates the issue. For example, social-media, sensor, weather, and location data are now all being added to traditional GIS data in the Smart Cities analytics space. With all the different formats, it is difficult to analyze data in its raw form—and cumbersome to compare and overlay entire data sets when what we need is just a single element within them. In a centralized model, the more complex the task, the more time and effort will be required to find an insight that is actionable and of immediate use.
Compounding the challenge is the velocity at which this data must be transported and manipulated to generate an actionable insight in real time. For example, when sensors at a city intersection detect a sudden change in weather, it’s useless to send that information to a data warehouse for analysis. By analyzing the data locally in real time, we can predict an increase in traffic, adjust traffic lights appropriately, and even notify drivers and traffic management personnel. This is something a traditional approach based on statistical data could never do.
Bringing analytics capabilities to the edge of the network helps us deal with the the “3 Vs.” When data is collected and manipulated at the edge of the network—in a “fog” computing delivery model—the volume of data flowing over the network is dramatically reduced; data can be extracted, overlaid, compared, and contrasted close to the source; and relevant insights can be produced in real time to enable informed actions.
Creating Insights and Value at the Edge
Whether at an intersection of busy streets in a European city, a retail store in the United States, or a retail bank branch in Japan, managing data in motion over a distributed network architecture becomes much more efficient than a centralized “command and control” model. Distributed data management minimizes the flow of irrelevant or unusable data, and enables insights when and where they are needed most. It is no wonder that Cisco Consulting Services’ latest research shows that Big Data and analytics drives 38 percent—or $7.3 trillion—of the $19 trillion in value that will be at stake in the Internet of Everything economy between now and 2022. Data virtualization at the edge of the network enables smarter analytics that unleash a big part of that value.