Machine Learning is NOT Rocket Science (Part 2)
In Part 1 of this blog, I point out that using machine learning algorithms is much easier today with packages such as scikit-learn, TensorFlow, PyTorch, and others. In fact, using machine learning has been relegated down to largely a data management problem and software development issue rather than the mythical complexity of a rocket science.
Yet, it has never been more important to take advantage of machine learning. According to the McKinsey report, being one of the first to adopt artificial intelligence has huge implications on future cash flow.
If indeed machine learning boils down to data management and using the machine learning packages, what challenges are enterprise facing today to make use of that data? For many Cisco customers, we find that
- Data scientists are tasked with mining value out of the data. As they explore the value of data source A and B, there may be petabytes of additional data, which represents huge changes in infrastructure requirement. While data scientists can have a small set of data using curated version of machine learning software on their laptop, scaling to petabytes clearly requires working closely with IT teams.
- There are numerous machine learning software stacks. Not only are there numerous options, many, like TensorFlow, even have nightly builds with new capabilities. Hence, the machine learning software ecosystem is relatively immature compared to say relational databases.
- IT teams are trying to help the data scientists. Yet, constantly changing data sources leads to drastic infrastructure requirement changes. With immature software ecosystem, it is very difficult for IT to create a stable environment with the needed infrastructure to scale.
At Cisco, we understand these challenges. Often times, we find that the IT team and the data scientists may be at odds with one another. To help our customers, Cisco has developed Cisco Validated Designs (CVD), in partnership with the machine learning software ecosystem, to create a complete solution based on unified architecture that can quickly scale, enabling the IT teams to better support the data scientists.
Let’s highlight some examples of Cisco Validated Design supporting machine learning. One of the prerequisites of machine learning is data itself. For many Cisco big data customers, they already have a data lake in Hadoop that requires further analysis to extract more value from the data. Hence, Cisco has partnered with Cloudera to create a CVD using Cloudera Data Science Workbench enabling customers to tap into the Hadoop data lake and use the latest machine learning frameworks such as TensorFlow and PyTorch. In a similar way, Hortonworks 3.0 also has the latest YARN scheduler that is able to schedule workloads on CPU and GPUs to support workloads like Apache Spark and TensorFlow as Docker containers1 . This solution enables IT teams to scale the CPU, GPU, and storage.
Cisco’s proven approach helps simplify deployments, accelerate time to insight, and enables the data scientists to curate their own machine learning software stack. For some data scientists that may want to do some machine learning experiments in the cloud, Cisco is actively contributing code to the Kubeflow open source project ensuring that there are consistent tools for machine learning both on-premise and in the cloud enabling a hybrid cloud architecture for AI and ML. In fact, Gartner points out that 57% of machine learning models are developed using resources on-prem.
By expanding our UCS portfolio with the new C480 ML, we continue to diversify for any workload. All UCS Servers are based on a unified architecture and can be managed by Cisco Intersight, making it simple to integrate into existing UCS environments.
In short, Cisco has expanded the UCS portfolio to now include a system that is purpose built for deep learning. We are working with machine learning software ecosystem to demystify AI/ML with proven, full-stack solutions developed with industry leaders. Our goal is to help IT better support AI projects and their data scientists. May your machine learning journey be fast and smooth.
Are you attending GTC in Munich, Israel, or China? Stop by the Cisco booth to find out more. Keep the conversation going. Feel free to reach out to me at @hanyang1234 to discuss AI/ML on UCS.
1 On October 3, 2018, Cloudera and Hortonworks announced their intention to merge. For now, Cisco will continue with separate CVDs for Cloudera and Hortonworks to ensure smooth customer deployment.