The Technology of Data
My fellow Chief Data Officers are probably up in arms when they read the title of this article. That’s because, for years, Chief Data Officers have been fighting the concept that data is a technology problem. And I stand here as a CDO to tell you that I agree with them: “Data is not a technology problem.” However, I would take it further to say: “Data is not a technology problem; rather, technology provides important capabilities to the CDO to address the data problem.” This may be predictable, given I am writing for a CIO publication.
Clearly data management has many different dimensions, but I fear that CDOs’ desire to ensure people don’t concentrate on technology exclusively sometimes leads them to completely de-emphasize the important role technology plays. I am a bit of a dreamer (believe me, it is part of what keeps CDOs sane) and I dream of a day when obtaining quality data is not a major event and does not require significant effort. We are not there yet, but I believe we’re on our way. Don’t get me wrong, data continues to grow at unprecedented rates, both in terms of volume and variety, so CDOs do not need to worry about being made redundant just yet.
The advent of continually smarter tools is making the job a little easier. Some people have heard me say that I want data to be like attaching a USB device to your PC. In “the old days,” it took a lot of work to add a peripheral to your PC. You had to find the right driver, you had to configure it, make sure there were no conflicts with other drivers and reboot your PC and then hope it worked. Today, you just plug the device into the USB port while the PC is running and it automatically recognizes the device, finds the driver, configures it and it makes the device immediately available to you without rebooting your PC.
Data should act similarly. When you acquire new data, the software should recognize the data, tag it to the business definitions, assess its quality, add it to the catalogue (along with information about the data and where it came from, aka lineage) and make it immediately available for use. Machine learning software driving data discovery. The good news is that this dream is not that far away. Newer machine learning software is beginning to be able to perform data discovery (i.e. identify what the data is), though there is still room for improvement. Industry standards would help here, but in lieu of standards, an industry knowledge base for the software to work from, would add even more intelligence to the machine learning software, recognizing different variations of objects in the particular industry sector (what an equity trade looks like, etc.).
When you acquire new data, the software should recognize the data, tag it to the business definitions, assess its quality, add it to the catalogue (along with information about the data and where it came from, aka lineage) and make it immediately available for use
Shipping machine learning software without a knowledge base is like having an infant child. It needs to be taught from the beginning, and then it will learn over time. But shipping machine learning software with an industry-specific knowledge base is like having a young adult. It has the basic knowledge, and it only needs to be taught the nuances of the environment. Industry standards or industry knowledge would accelerate the usefulness of machine learning “data discovery” software.
Once the data is recognized and tagged, then the machine learning software would also know characteristics of acceptable values for the attributes based on the industry standards and the patterns it sees in the data. So now we can use machine learning analytics-based data quality assessments instead of the traditional rule-based deterministic assessments. This has many advantages: 1 – the data quality assessment would be more accurate as it would be based on real-world statistics; 2 – it would become more and more accurate over time; 3 – it would be more responsive to changes in the data; 4 - we will never be able to write enough rules to have an effective assessment of data quality. The evolution of software from deterministic rules-based data quality tools to analytical tools is just beginning and this is another area that needs to continue to improve.
I see a bright future for the Chief Data Officer as “the age of the data machine” becomes a reality. With the help of technology and the focus of the CDO, data professionals will be elevated above the manually intensive data-management processes, and they will have more time to focus on data as a strategic asset. So, my ask of all technology people is to partner with your CDO and together implement a data-aware ecosystem.