Learn more about the problems we are passionate about solving. Hopefully you find this content relevant and helpful. Please don't hesitate to reach out to learn more.
Taylor Culver
Feb 2023
This article intends to serve as a "how-to" guide for data leaders to manage the progress of their data strategy through simple yet powerful data management documentation. A series of questions will follow sections to facilitate your thought process on where you can tweak or pivot your data strategy to empower your data communities to drive tangible and measurable results with data.
In today's world, data governance is one of the most commonly cited reasons data initiatives don't reach their intended outcomes alongside data quality. People quickly blame data governance without understanding the problem's root cause.
Despite always being a discipline, data governance has gotten a tremendous push since 2008 through the financial sector as a reaction to the lack of transparency to syndicated assets and derivatives that ultimately led to the great recession. As a result, most, if not all, financial firms have implemented solutions and policies to meet regulatory reporting requirements. Seconded by this crisis was the advent of GDPR/CCPA on the heels of data breaches by some well-known tech and data companies. As a function of these two examples, data governance "best practices" have become a form of risk mitigation and regulatory compliance that organizations must do.
What's the problem, then?
Compliance costs money. Financial institutions spend $100's of millions of dollars for compliance with many regulations that few companies need to mirror or frankly can afford. So when data governance "best practices" are applied to, say, a retailer or a CPG, they mitigate a risk that primarily exists with consumer data agencies and financial sector regulatory reporting. Therefore the cost of implementing data governance exceeds the value. As a result, most other industries informally or passively practice data management.
So what can less regulated industries do?
When your colleagues say they "need data governance," ask them what they mean. What they'll get to, eventually, is that the data available needs to be fit for purpose. When data is not fit for purpose, it is because of two reasons 1) the data does not exist, and 2) the data that does exist is unsuitable for their analytical needs.
The problem here is not data governance; the problem is that the data is unmanaged to suit the end user's needs. How on earth can you govern something that is unmanaged?
Data leaders must expand their reach beyond an innovative multiyear plan for driving change with data; they also need a day-to-day plan for managing results. As discussed in the last article, this will take a programmatic approach to a portfolio of projects that solve for use cases. Remember, what isn't measured isn't managed. So the common miss is when data leaders jump from data strategy to data governance without having data management processes that are universally understood and accepted by their colleagues.
Because data is still a somewhat nascent function, your stakeholders often need clarification along the way. Most people, when confused, tend to disengage. So activities, roles, and responsibilities need to be crystal clear.
That starts with setting the record straight. Data management is the planning and execution of improving the value of data at an organization, and governance is the oversight and enforcement of data management.
Most organizations need to start thinking more about managing data and less about how to govern it. This is not a chicken or egg problem. You literally cannot govern an unmanaged process.
Ask Yourself…
Note: Many organizations must consider how they will manage the data strategy to jump from data strategy to governance. Ask yourself, what is the management framework for your data strategy, and how will it deliver its intended outcomes?
Well…we've started it already:
It's not intuitive, but the most valuable data in any company doesn't come from a database. It comes from people's brains. It's the knowledge to run business processes and operational systems to deliver customer value. Getting this knowledge out of people's brains is time-consuming, and when you get more than one person sharing their knowledge, it likely becomes a conversation where there isn't a single answer. It’s an exercise that almost guarantees confusion.
Getting people to align on a common problem is paramount. If you can't define what is needed to solve a problem, how can you ever build a solution to solve that problem?
Two types of artifacts can facilitate the data management process well: the business glossary and the data dictionary. A business glossary is a collection of business terms and their respective definitions independent of a solution, and a data dictionary is a collection of what data is in a database. Without context or ongoing stewardship, these artifacts are useless and tend to collect dust at most companies.
A business glossary is a collection of business terms relating to a use case. Defining all the business terms needed to solve a specific use case is challenging when building a business glossary. Where most organizations get this wrong is they try to build a glossary independent of a use case for metrics companywide. This approach makes building a glossary nearly impossible because the same business term in a separate use case could mean different things. Think about how many ways your organization refers to revenue. If you have a clear-cut use case, the next problem is getting people to agree on definitions.
After the glossary is complete, you can build out a data dictionary. A data dictionary is all the relevant data elements for the business terms within the glossary and is a relatively easy document to produce. It becomes unnecessarily complicated when the business must define all the data in a database. Most organizations rush into complex dictionaries with metadata technologies and ultimately lose business engagement in this process because they ask the business to do too much work that is 1) boring 2) purposeless and 3) overwhelming.
Additionally, most organizations try to establish data ownership at the domain level or data level versus the business level. This approach doesn't work because the business drives requirements, not the data team. Plus, business needs to compete for the same data within a domain. With someone working with each business steward, aligning them to a standard definition of data stewardship is possible.
Once you have a data dictionary and a glossary, it's time to map the two together. Mapping is critical because it will show where gaps exist between business requirements and available data. Mapping is powerful because it shows the business by business terms where they don't have the data they need and shows the technology team where the data model will need to change to support the use case in consideration.
In most cases, data models or schemas become complex because they try to serve all communities with varying business terms across a single data model. Building a single data model that serves many use cases is undoubtedly possible. It is made easier with proper data management, but you need more than that to avoid ending up with a hot mess.
All this documentation is a lot of work. Who is responsible for it?
Ask Yourself…
Note: Every organization probably has a data dictionary. A fraction of those have business glossaries, and an even smaller fraction actively use the same version. How are your artifacts enabling your process?
This work is very intellectual, tedious, and frustrating, especially for those with other responsibilities. Data leaders must be patient with the process and remember internal resources cost money, and their time is valuable. Spending unnecessary cycles on the business glossary dilutes the overall value of your data strategy, so balance documentation needs with organizational priorities.
It’s worth noting that most companies don't need to hire a full-time data steward. Data stewardship is a shared responsibility across the business and is a temporary role depending on business needs. If the context for the glossary and dictionary resides in your existing talent's hearts and minds, you’re going to need to get this information out of their heads somehow. Hiring a data steward is not a quick fix. The data steward will need to learn the data, which will require them to work with technical and business functions. If the technical and business functions were too busy to participate on the first go around, likely, this is not a priority, so their engagement will be poor
Any organization investing in "data governance" should consider the costs of managing data and the time commitment to its internal resources. It's OK if your data hurts productivity, limits growth, and increases risk if the benefits exceed the costs. Fixing data can become a utopian goal that most organizations need the economics to fix. Organizations that aren't mandated to perform this activity should consider stopping it. Organizations can save money by eliminating "data governance" or data management activities.
If you've made it this far, you're in a peer group of a select few. Now it's the easy part - build data products. Now that you know the problem that needs to be solved, have executive sponsorship, and have the budget and the requirements to solve it, it's time to implement the required data product.
In my next article, I will talk about the types of data products and how to drive the successful launch of highly adopted data products that yield tangible and measurable organizational transformation.