What the Supermarket taught me about Big Data
Author’s Note: I have no kids. I have friends with kids, who used to be in diapers. The kids were in diapers, not the friends. I’ve changed a few in my day, but not nearly as many as my friends have. And yes this has some sort of relevance to this story…
In every trade show or conference there’s someone talking about Big Data. They talk about algorithms, CPUs, memory, software stacks, cabling, racks, ROI, TCO, nodes, names, federation, centralization, organization until you get “the pitch.” I’m not really interested in the pitch for why someone’s product is better than the other, I’m more interested in the “What is the Problem that you’re trying to solve?” This to me gets to the root of Big Data,or the consolidation of a set of diverse data sources with a multitude of data types for which you’re attempting to determine relationships and patterns amongst it. Phew. Got it?
Me neither, but I like to think in examples and this is where it dawned on me in the grocery store.
I was walking through the local grocery store, a place with a very high number of items. In fact, in 2010, the average grocery store carried over 38,000 different items. In some areas of the store, the placement of these items is logical. Dairy is on the back wall so that it takes a very short trip from a refrigerated truck, to a refrigerated room which often opens into the milk cases. Restocking the shelves seems to be a process of just lifting the 1Gal jugs off a pallet/shelf and putting it into the case, without having to wheel it out into the store. Others not so much… I’ll get to one of them in a bit.
The store also contains seemingly unrelated types, sure they carry food and other edibles, but also flowers, hardware, cooking pans and yes, diapers. I get it, it’s a one stop shopping location. The problem the supermarket faces isn’t about “how do I store all these items on the shelves”, but “how can I arrange the items such that buying item #1 puts the customer in the right mindset to buy item #2?”
So I continued in my shopping tour of the store and came down in aisle which had diversity and quantity in it, but I didn’t readily see the pattern. There was a relationship between items on one side of the aisle, but I was having trouble seeing the relationship between both sides of the aisle. On the right side was flowers, balloons and greeting cards. Never seen it fully stocked, as the only time I visit it is at 5:45pm, on my way home, on February 14th, and stand there looking at a 99% empty shelf with the 20 other doofuses that waited until the last minute, and pray “it’s the thought that counts” will save my procrastination. But I digress. All these items were pretty easily related.
The other side had baby food, diapers, baby wipes and other infant related items for which I had no idea what they were for (See Author’s note above).
So here we are, on one side of the aisle we have baby products and the other side we have cards, balloons and flowers. What is the relationship between the items on the left vs. the right? Seems like a problem for Big Data. It’s one thing for a human to look at these and come up with a relationship, but it’s another to be able to feed this into a automated system and come out with a relationship.
So how does this fit into the definition?
- Consolidation: One stop shopping
- Diverse Data Sources: Different types of items
- Multitude of data types: 38,000 items…
- Relationships and Patterns: What do the items on the left and right side of the aisle have in common?
So what is the relationship between diapers, baby food and flowers/cards? While this is solvable with a machine system and algorithms, what makes this interesting isn’t the Petabytes of data or racks of servers, but in uncovering the relationship in all that data.