Self-service Data Preparation Made Better with Data Virtualization
Self-service data preparation was a hot topic at the Strata + Hadoop Conference in New York City last month. In the video I recorded there I discussed Cisco’s new offering in this market place, Cisco Data Preparation.
And in his recent blog, Unleash Your Business Analysts Cisco Data Preparation, Kevin Ott did a great job laying out the business and IT case for Data Preparation. Given the big data and analytic opportunity every enterprise faces in our increasingly digitized business environment, Data Preparation has gone from a nice to have to a must have.
But Kevin also wisely suggested a bit of caution, especially in his statement “Independent, ungoverned data prep efforts can lead to duplication of effort, inconsistently transformed data sets of unclear origin, resulting in inaccurate analysis and potentially bad business results.”
How Cisco Data Preparation Delivers on Governance Promise
Cisco Data Preparation attacks the data governance problem in two fundamental ways. This first is through myriad governance capabilities inherent in the offering itself. These include a complete lineage of all data preparation activities performed by the analyst, so there is never a question of where the data came, how it was prepared, or who did the work. This transparency allows business analysts the freedom to explore, transform and enrich data based on their skills and business acumen, without too many extra constraints.
Two-way integration with Cisco Data Virtualization is the second way Cisco Data Preparation enables data governance. Lets explore how it works.
Data Virtualization as a Source to Data Preparation
In the past ten years, data virtualization has advanced from a data integration project tool to an enabler of enterprise-scale data services environments where users go to find and take advantage of shared, curated data sets that IT provides.
This well-governed “gold” data is gold for any business analyst who needs to prepare data in support of a new analytic effort. I recently blogged about Cisco Data Virtualization’s business-friendly, Business Directory where business analysts can use search-based approach to quickly identify data required, and automatically import that data into Cisco Data Preparation’s analytic sandbox.
Once an analyst has prepared the data using the transparent governance within Cisco Data Preparation, the analyst may want to share that “answer set” with other analysts, IT data engineers, data scientists and more so they can take advantage.
But how can the enterprise continue to govern this data outside the data preparation environment? Cisco Data Virtualization is the answer. Adding a Cisco Data Preparation answer set as an object in Cisco Data Virtualization takes five minutes or less. With its built-in access controls, lineage and where-used on all sources, logging and more, Cisco Data Virtualization ensures the data governance required.
Beyond data governance, built-in integration, allows multiple deployment options depending on the degree of sharing that may be required. Check out Rick van der Lans, data virtualization’s leading independent analyst, new white paper, Strengthening Self-Service Analytics with Data Preparation and Data Virtualization, that includes everything you need to know about these options and how they work.
Experience Data Preparation and Data Virtualization in Chicago
There you can attend the breakout session, “Data Preparation for Self-Service Analytics” and stop by the Solution Showcase for an integrated demo.
I’ll be there as well. I would enjoy the discussion.
Join the Conversation
Learn More from My Colleagues