Forecast. Predict. Tell me the Future. So I can plan how to invest today. So I can be prepared for the worst.

Thus flows the investment thesis in Analytics today. The craze of pure Business Intelligence that ran from say 1990 through 2005, was replaced by the need to become Predictive. This wave is now a good decade old, though this is a slower moving wave – significantly so because it is harder to understand than traditional Business Intelligence.

While companies (and individuals) have been predicting for a long time (in many ways we cannot operate without some measure of assumption about the future), statistical and machine learning based prediction offer the seemingly promising proposition of predicting based on past behavior and pattern in a way that overcomes human limitations.

What are some of these human limitations?

  1. Limits in our ability for consuming raw information
  2. We have fantastic pattern detection capability, yet we cannot detect patterns in unfamiliar data
  3. Lack of objectivity while assessing data, especially if certain results are sought apriori
  4. A tendency to read too much into thread-bare evidence
  5. Only looking at successes, ignoring failures as “exceptions”
  6. One could add many more….

But do predictive models help us overcome all or many of these limitations and bias?

Bias Cartoon

Developing strong, credible and reliable models are just as hard as making good predictions using pure human judgment. This is so because models rely on:

  1. Sampling Bias – not covering all conditions
  2. Major Trend Bias – relegating corner cases to an after-thought
  3. Selection bias of data – such as looking at success only
  4. Measurement bias of data – which may be beyond the control of the human modeler
  5. Expectation bias – once again, the human element can affect the results of the model
  6. Myriad assumptions built into the Model – lax assumptions can kill
  7. Data Drift – the tendency for the operating environment of the model to gently change

As may be becoming evident, while statistical and machine learning models allow us to look at a wider range of data and situations than what is possible just with direct human observation, these models do not rid us of the many bias that bedevil most models.

Does that mean building these models is pointless? As George Box, one of the finest statisticians of the 20th century said – “All models are wrong, but some are useful” – the ability of models to precisely replicate the underlying process it is trying to model is probably next to zero, but well-designed models can provide you with useful clues about how the process is operating now and will likely operate into the future, if conditions remain largely similar.

Dilbert statistics

So, to really reach the plateau of productivity, let us treat the models as an extension of human judgment, a useful tool that lets us review processes that were beyond the realm of human ability to study and observe directly – but use its results with as much circumspection and care that we apply to our own predictions of the directly observed world.


Sri Srikanth

Advanced Data Analytics & Strategy, Senior Data Scientist, Cisco Digial