This article was originally posted on OpenDataScience.com.
The data science valley of despair is real. Time after time, leaders who’re well-versed in case studies and industry research extolling the returns of data-driven insights seek to innovate their business—and land in a hole of frustration and write-offs.
It may be more accurate to call it a crater of despair given that Gartner predicts 85% of data science projects will fail (2018).
What do the 15% of successful data science projects have in common? A lot—including an informed decision about whether its data scientist(s) were hired or trained. This decision can make the difference between having unsuccessful and successful data scientists at your company.
On the surface, it may seem inane. Now that leaders can hire a candidate with a bachelor’s or master’s degree in data science, why would you train one instead? Assuming you had the time and ability to replicate a world-class data science program, wouldn’t it be at best, inefficient and at worst, ineffective?
It depends on your domain—or more specifically, your data’s complexity and lineage.
Formal data science education delivered by universities, MOOCs, and other means can only cover 2 of the 3 interdisciplinary skills required to be successful in the role: statistics and computer science. The 3rd interdisciplinary skill, domain knowledge, cannot be taught en masse because it isn’t consistent across industries—or even companies. No institution can teach the intricacies of your data. There will be a knowledge gap. The question is, how wide? Crater? Valley? Or navigable pass?
Data is a language—every company, if not every business unit, speaks its own dialect. As with the spoken word, these differences came about organically, and vary or evolve based on the group’s needs. Remember life before “bling?” The same is true of “channel partner.” These dialects become especially confusing for general terms which don’t conform to a common taxonomic definition. For example, IT’s “customer” is likely an employee, whereas Sales’ “customer” is typically an individual with purchasing power, who may be different from the “end user” who is referred to as the “customer” by your company’s external contact center.
Restated—domain knowledge is the learned skill to communicate fluently in a group’s data dialect. Its component parts are: general business acumen + vertical knowledge + data lineage understanding. For example, a data scientist in people analytics requires a foundational knowledge of the business + human resources + the inner-workings of their company’s HR tools and processes which create the data they work with. Those processes and other inputs to the dataset are crucial. A data scientist can’t create meaningful insights before they understand what the data is saying today. Is it telling a story? Is it, or subsets of it, too polluted to use today? Are some data points proxies for or inputs to others? The more complex your business processes and associated data lineage, the longer your data dialect will take to learn.
For digital native companies whose data collection is automated with intuitive dialects (i.e. a “click” is a “click”), domain knowledge can be developed much more quickly than for large, longstanding companies which have undergone transformations, acquisitions and/or divestitures.
If you hire a data scientist, how long will it take them to learn your data dialect? And can you provide air cover for them to do so before applying pressure to produce “insights?” Would it be faster or more effective to upskill someone (i.e. a business analyst or developer) in the areas of statistics and computer science they aren’t already well versed?
The real question is—what makes the most sense for your project(s)? Hiring data scientists? Developing successful data scientists? Or would a team comprised of both types help you avoid data science crater of despair?