It is always interesting how commonly understood terms turn into nuanced expressions in the hands of specialists – whether they be lawyers, accountants, doctors – or indeed data analysts and statisticians. One pair of expressions in this regard is likelihood and probability – which may seem to hold similar meanings in common parlance, but hold distinctly separate purposes in the world of data analytics.
Probability is simply the probability of an outcome, given a set of parameters.
Given a person’s
- Credit History
- Criminal Record
- Length of Employment and more
Q.What is the Probability of Default against a Loan?
A. Probably some number between 0 and 1 !
This is classical probability and what it is understood to be in common language as well.
Likelihood however flips the equation, and seeks to estimate the values of the parameters of the equation/formula/model, given a set of outcomes across several observations.
In other words, the analyst has in his/her hands a set of observations (say, default or no default against a loan), and a set of values for the variables (criminal history, credit history, salary etc.) – and now tries to develop an equation (like a Likelihood function) that combines these variables in specific ways so as to correctly predict the observations. Likelihood is thus of intense interest to data analysts and modelers as they seek to derive a workable model/function from the sets of observations available to them. And once a model is available – one can both better control current operations as well as forecast future outcomes.
Traditional Digital Analytics has spent considerable time in the last 15 years in the world of probabilities, first measuring, then reporting on probabilities (conversion rates, bounce rates et al) – which gave the site owner a good sense for revenue/leads that will be generated by a certain amount of traffic. How were these probability models developed? The probability model was derived directly from empirical evidence, and with an assumption that past performance is indeed predictive of future performance. Some nuanced usage of machine learning techniques appeared in multi-variate testing (which draws upon notions of clustering/segmentation). But otherwise the field has largely relied on an assumption of a direct relationship between user action and elements shown on the digital property to assess probabilities. It has served the field reasonably well thus far.
But now, as was noted in the previous blog, marketing analytics is pushing deep into the digital medium, and bringing with it more complex sets of inputs and observations, necessitating the introduction of likelihood functions and modeling techniques. Its influence on traditional digital analytics is as yet an unknown and will be interesting to observe (and drive!) in the coming years.