The Seven Dimensions of Data Governance
Gartner predicts that through 2022, only 20% of organizations investing in information governance will succeed in scaling governance for digital business.
That’s a lot of failure.
Data governance is a challenging discipline, for reasons that are fairly well known. Attend any data governance conference and you will hear a lot of talk about insufficient funding and lack of executive support, the absence of a single data governance tool or technology in the market that satisfies all (or most) business requirements, and the difficulty of finding skilled resources. More fundamentally, I believe that there is often a lack of understanding and agreement on what exactly constitutes data governance, which makes it difficult to present abusiness case and execute data governance initiatives successfully.There is no dearth of definitionsof data governance, but they are often abstract and do not provide guidance on how to go about it, raising the question: how do we govern data governance?
In practical terms, I like to think of data governance in terms of seven dimensions or competencies.Using these competencies we can apply portfolio management techniques to data governance which creates clarity for all stakeholders, enables us to use well-known practices, and improves the ability to deliver data governance initiatives successfully. This approach also us to clearly define the risks and rewards of the initiatives, and create a business case.
The seven dimensions of data governance are as follows.
Master data:The goal of mastering data is to bring together fragmented data in one place for all the important “nouns” of the business – employees, customers, products, suppliers, patients, providers and so on. This data fragmentation may be in terms of completeness (customer name is in one database, date of birth in another), quality (multiple address may be spread across multiple databases, all at variable levels of cleanliness) and quantity (some customers are in one database, some in another). MDM improves data quality and consistency, enables business stewardship of data, and reduces IT implementation costs, and is perhaps the second most widely accepted and matured data governance dimension in existence today.
Reference data: Reference data is that which further clarifies important data entities (akin to adjectives and adverbs); for example, customer state or zip code, supplier reliability rating and customer order payment type.Industry specific terminologies, for example ICD codes in healthcare, are also examples of reference data. Reference data is the glue that holds together systems and provides consistency to reporting and analytics, and yet arguably the most undervalued of the data governance dimensions.
Metadata: Metadata, famously the “data about data” describes your data. For example, the metadata about customer last name could include the list of application systems where that data field is held and how it is stored in each system (20 characters wide in one system, 15 characters wide in another system). Metadata can be strung together to build data lineage, which helps us understand detailed, end-to-end data provenance, and figure out (for example) why the last few characters of an individual’s last name were truncated in a downstream report. Collecting and managing metadata can be laborious and of questionable value, which is why many organizations hesitate to invest substantially in this form of data governance. However, a clear vision and proper stakeholder participation can yield substantial ROI in terms of improved quality of software development, improved data quality and efficient compliance reporting.
Information Catalogs: In the context of data governance, information catalogs typically refer to either data or reports catalogs. The terms are self-descriptive. A data catalog lists information about different data sets available across the enterprise (or some part of the enterprise), while a reports catalog lists the different reports (including dashboards etc.) across a set of application or reporting systems. This information held in the catalogs is in fact the metadata described above: the information catalog holds metadata about data sets and reports. The primary benefit of information catalogs to enable faster search of information assets, improving productivity. A good reports catalog helps users find existing reports and prevent duplicate report requests, which can yield very high return on investment.
Business Vocabularies: As business has become complex, so has the business vocabulary. How can a company report on customer churn rate without first precisely define the terms customer and churn rate? How do you define an attending physician for a patient that was seen by several physicians and specialists during a hospital stay? For an organization to speak the same language, they must first agree on a common vocabulary. (Taxonomies and ontologies fall in this dimension as well.) Defining and implementing processes for business vocabulary governance yields rich dividends in business efficiency and analytic accuracy.
Data Quality:Data quality is perhaps the oldest data governance competency, and hence often considered synonymous with it. Data quality problems can be extremely wide-ranging, from unclean addresses (leading to returned mail costing hundreds of thousands of dollars) to complex, systemic problems related to other dimensions of data governance (for example, fragmented master data or inconsistent reference data) that can be difficult to detect and even more difficult and expensive to correct. This makes data quality governance a potentially nebulous activity, difficult to scope unless there is a clear vision and well defined business drivers.
Big Data/Analytics:These are heady times for big data and artificial intelligence/machine learning, not unlike the California gold rush of the mid-nineteenth century. There is gold in the mountains of data, and everyone is off to mine it, loaded with all kinds of tools and improvisations. Governance is not uppermost in their minds, but there is a growing awareness of the need to integrate data lakes and big data analytics into enterprise data governance. New generation data catalog tools have taken the lead in this regard, but much more needs to happen.
To fully define a data governance problem or goal and deliver an initiative, we also need to apply a data governance framework to each of the above, which is a topic for a different day.
What is your experience with data governance? I look forward to hearing about it.
Rajan Chandras is director information management at NYU Langone Health, a premier academic medical center headquartered in New York City. His responsibilities include data architecture and strategy, master data management, data governance, and big data. The views expressed here are his own and not necessarily those of his employer.