OCI Data Catalog manages 8 terabytes of metadata with Oracle Autonomous Database
Oracle Cloud Infrastructure Data Catalog uses Autonomous Database to manage metadata, enabling customers to discover, use, and govern data assets.
“When we evaluated data stores, it was very important to simplify management and administration. We wanted to spend our time developing the service, not managing databases. Autonomous Database meets our technical needs, makes the service more resilient, and saves us an estimated two full-time headcount.”
Business challenges
Today organizations need to work with data in many different data stores, in the cloud and on-premises. Data professionals need help to discover that data, put it to productive use, and provide governance.
Oracle Cloud Infrastructure Data Catalog addressed the development team’s need to discover data by building an inventory of data assets with technical and business context, a business concept glossary (so that users have a common understanding of what that data describes), and a common metadata store for a data lakehouse. OCI Data Catalog supports object storage and different databases, both in the cloud and on-premises. Integration between OCI Data Catalog and Oracle Autonomous Database automatically creates external tables, so tools or individuals using the database can transparently query data in the lakehouse.
To support its mission, OCI Data Catalog needs to harvest metadata from all supported sources. This metadata is all managed in a central repository where users can collect technical and business metadata, track provenance, as well as create glossaries and tags to make search and discovery more productive.
Why Oracle chose Autonomous Database
While building a new service, the software development team was keen to spend time building new functionality, rather than on database administration, so the group evaluated managed database services.
After looking at other options, Oracle’s software development group selected Autonomous Database for transaction processing and mixed workloads. Other managed options are available, but they all require some measure of manual administration. The team wanted to avoid the associated costs, potential for error, and security risks of manual administration. With fully autonomous operations, developers could focus on building their service while the database handled tuning, patching, security, and more.
By relying on the database to manage itself, staff resources could be redirected to develop the service itself, shortening the time to delivery.
Results
OCI Data Catalog is currently deployed in all Oracle Cloud regions across commercial and government realms. Each region has one or more database instances, starting at 15 OCPUs (30 vCPUs) and supporting many customers. OCI Data Catalog manages more than 8 TB of metadata, and this is growing exponentially.
New database instances are easily provisioned using Terraform scripts when required. However, autoscaling handles temporary peak usage without over provisioning. In the event of unexpected load, the database will temporarily add additional compute resources, up to 3X of what was originally provisioned. This helps to ensure consistent performance for customers, without the need to over-provision.
In the near future, the development team plans to add data lineage capabilities to OCI Data Catalog using graph analytics. Because graph analytics are built into Autonomous Database, the software development team does not have to manage the complexity and security risk of exporting data to a separate graph engine.
Autonomous operations are the primary driver for use of Autonomous Database. By relying on the database to manage, scale, and patch itself, with no downtime or service disruptions, developers saved an estimated 2 full-time headcount. Those resources were applied to developing the service itself, shortening time to delivery.