Part 1: Understanding the Needs of the Data Science Team
According to a recent Harvard Business Review article titled, “Are You Asking Too Much of Your Chief Data Officer?,” Chief Data Officers have an increasing number of roles and responsibilities; yet, they’re having difficulty achieving everything they’re being asked to do. IT Teams can and should help the Chief Data Officer. By doing that, it helps justify the IT budget and accelerates an enterprise’s artificial intelligence and machine learning (AI/ML) deployment.
“In studying CDOs [Chief Data Officers] and their roles over the past decade, we’ve identified seven key types of CDO jobs, each distinct enough that it would be difficult or impossible for one person to perform all of them well.”
– Thomas H. Davenport and Randy Bean, Are You Asking Too Much of Your Chief Data Officer?
Here are some of the roles in which the IT teams can help:
Chief Data and Analytics Officer
The Chief Data and Analytics has the difficult job of wrangling the data and using the latest machine learning algorithms, often time in a Jupyter notebook running TensorFlow on a cluster of GPU servers. To achieve this goal, data science teams have to work directly with the infrastructure, all the way down to the BIOS version, CUDA drivers, Kubernetes, storage, and countless other details. The reality is that data science teams are already overwhelmed with data science issues, and as such want to maintain their focus on the data science, not the infrastructure.
Data science teams work hard to monetize the data for whatever product or service they support. IT needs to be fully aware of the value of the data so that they, too, can justify the infrastructure and the human resources devoted to helping the data science team. Therefore, it’s important that IT becomes the champion of extracting value out of the data as a way to bring thought leadership to the enterprise and as part of the broader digital transformation that so many enterprises are embarking upon.
Data science teams need to clean up the data, reduce duplication and fragmentation. In other words, they need to break down data silos, but getting there requires the proper compute, network, and storage infrastructure, along with Apache Spark, Kafka, good old Python elbow grease, and many other tools. This task is clearly something that IT can and must help to enable a successful deployment.
Data science teams also need to have the right security mechanisms to ensure that the data is used in a controlled fashion such that regulatory and compliance requirements are met. With IT’s enforcement mechanisms, it can make sure the right policy is set and enforced.
Cloud can also be an important part of the infrastructure. Given its ease of use, it’s possible to create multiple copies of subsets of data and multiply the data governance issue. Cisco believes that IT must be intimately involved with the data science projects to ensure that data governance, both on-premise or on-cloud, be properly administered and managed.
By following these steps, your IT team will better understand the type of assistance data science teams really need. Stop by next week when we talk about bridging the gaps that enterprise often face in operationalizing machine learning.