Data Catalog Use Cases
In the past few years, the concept of a data catalog has become popular because of the increasingly large amounts of data that now have to be managed and accessed. Cloud, big data analytics, AI and machine learning have started to change the way we need to see, manage, and leverage our data—and not just manage of it, but be able to fully use and access it.
Using a data catalog the right way means better data usage, all of which contributes to:
- Cost savings
- Operational efficiency
- Competitive advantages
- Better customer experience
- Fraud and risk advantage
- And so much more
Here are just a few of the use cases for a data catalog. But really, a data catalog can be used in so many ways because fundamentally, it’s about having wider visibility and deeper access to your data.
Self-service analytics. Many data users have trouble finding the right data. And not just finding the right data but understanding whether it’s useful. You might discover a file called customer_info.csv. And you might need a file about customers. But that doesn’t mean it’s the right one because it can be one of 50 such similar files. The file may have many fields and you may not understand what all of those data elements are. You’ll want an easier way to see the business context around it, such as whether it’s a managed resource, from the right data store, or what the relationship is with other data artifacts.
Discovery could also entail understanding the shape and characteristics of data, from something as simple as value distribution, statistical information, or something as important and complex as Personally Identifiable Information (PII) or Personal Health Information (PHI).
Audit, compliance, and change management. With ever-increasing government regulations around data, you often need to demonstrate the provenance of data—whether certain data artifacts are coming from this source or that source, or how it’s getting transformed before reaching whatever the final target is. When looking at a table, report, or file, your data users often want to understand where the data is coming from and how it’s moving through the organization in various ways. From a change management perspective, it’s important to view how changes in one part of a data pipeline affect other parts of the system. This is why customers seek detailed data lineage.
Supporting data governance with business glossaries. Most organizations have a vocabulary that everyone agrees on and a consistent understanding that they can use for business concepts. But often, it’s recorded in Excel sheets lying around somewhere—and that’s if the organization is lucky. A data catalog is a much better place where you can store and manage this vital business information.
A data catalog also allows you to establish links between business terms to establish a taxonomy. Beyond that, it can record relationships between terms and physical assets such as tables and columns. It also enables users to understand which business concepts are relevant to which technical artifacts. This can be used to classify data assets along business concept lines and then actually use business concepts instead of technical names for search and discovery. This helps by increasing user trust in what they’re looking at, because they can see everything that’s related to their data and it’s often a good starting point for data governance.