Data management is the practice of collecting, keeping, and using data securely, efficiently, and cost-effectively. The goal of data management is to help people, organizations, and connected things optimize the use of data within the bounds of policy and regulation so that they can make decisions and take actions that maximize the benefit to the organization. A robust data management strategy is becoming more important than ever as organizations increasingly rely on intangible assets to create value.
Managing digital data in an organization involves a broad range of tasks, policies, procedures, and practices. The work of data management has a wide scope, covering factors such as how to:
Create, access, and update data across a diverse data tier
Store data across multiple clouds and on premises
Provide high availability and disaster recovery
Use data in a growing variety of apps, analytics, and algorithms
Ensure data privacy and security
Archive and destroy data in accordance with retention schedules and compliance requirements
A formal data management strategy addresses the activity of users and administrators, the capabilities of data management technologies, the demands of regulatory requirements, and the needs of the organization to obtain value from its data.
Data Capital Is Business Capital
In today’s digital economy, data is a kind of capital, an economic factor of production in digital goods and services. Just as an automaker can’t manufacture a new model if it lacks the necessary financial capital, it can’t make its cars autonomous if it lacks the data to feed the onboard algorithms. This new role for data has implications for competitive strategy as well as for the future of computing.
Given this central and mission-critical role of data, strong management practices and a robust management system are essential for every organization, regardless of size or type.
Today’s organizations need a data management solution that provides an efficient way to manage data across a diverse but unified data tier. Data management systems are built on data management platforms and can include databases, data lakes and data warehouses, big data management systems, data analytics, and more.
All these components work together as a “data utility” to deliver the data management capabilities an organization needs for its apps, and the analytics and algorithms that use the data originated by those apps. Although current tools help database administrators (DBAs) automate many of the traditional management tasks, manual intervention is still often required because of the size and complexity of most database deployments. Whenever manual intervention is required, the chance for errors increases. Reducing the need for manual data management is a key objective of a new data management technology, the autonomous database.
Data Management Platforms
The most critical step for continuous delivery of software is continuous integration (CI). CI is a development practice where developers commit their code changes (usually small and incremental) to a centralized source repository, which kicks off a set of automated builds and tests. This repository allows developers to capture the bugs early and automatically before passing them on to production. Continuous Integration pipeline usually involves a series of steps, starting from code commit to performing basic automated linting/static analysis, capturing dependencies, and finally building the software and performing some basic unit tests before creating a build artifact. Source code management systems like Github, Gitlab, etc., offer webhooks integration to which CI tools like Jenkins can subscribe to start running automated builds and tests after each code check-in.
A data management platform is the foundational system for collecting and analyzing large volumes of data across an organization. Commercial data platforms typically include software tools for management, developed by the database vendor or by third-party vendors. These data management solutions help IT teams and DBAs perform typical tasks such as:
Identifying, alerting, diagnosing, and resolving faults in the database system or underlying infrastructure
Allocating database memory and storage resources
Making changes in the database design
Optimizing responses to database queries for faster application performance
The increasingly popular cloud database platforms allow businesses to scale up or down quickly and cost-effectively. Some are available as a service, allowing organizations to save even more.
Based in the cloud, an autonomous database uses artificial intelligence (AI) and machine learning to automate many data management tasks performed by DBAs, including managing database backups, security, and performance tuning.
Also called a self-driving database, an autonomous database offers significant benefits for data management, including:
Decreased potential for human error
Higher database reliability and security
Improved operational efficiency
The increasingly popular cloud data platforms allow businesses to scale up or down quickly and cost-effectively. Some are available as a service, allowing organizations to save even more.
In some ways, big data is just what it sounds like—lots and lots of data. But big data also comes in a wider variety of forms than traditional data, and it’s collected at a high rate of speed. Think of all the data that comes in every day, or every minute, from a social media source such as Facebook. The amount, variety, and speed of that data are what make it so valuable to businesses, but they also make it very complex to manage.
As more and more data is collected from sources as disparate as video cameras, social media, audio recordings, and Internet of Things (IoT) devices, big data management systems have emerged. These systems specialize in three general areas.
Big data integration brings in different types of data—from batch to streaming—and transforms it so that it can be consumed.
Big data management stores and processes data in a data lake or data warehouse efficiently, securely, and reliably, often by using object storage.
Companies are using big data to improve and accelerate product development, predictive maintenance, the customer experience, security, operational efficiency, and much more. As big data gets bigger, so will the opportunities.
Most of the challenges in data management today stem from the faster pace of business and the increasing proliferation of data. The ever-expanding variety, velocity, and volume of data available to organizations is pushing them to seek more-effective management tools to keep up. Some of the top challenges organizations face include the following:
Lack of data insight
Data from an increasing number and variety of sources such as sensors, smart devices, social media, and video cameras is being collected and stored. But none of that data is useful if the organization doesn’t know what data it has, where it is, and how to use it. Data management solutions need scale and performance to deliver meaningful insights in a timely manner.
Organizations are capturing, storing, and using more data all the time. To maintain peak response times across this expanding tier, organizations need to continuously monitor the type of questions the database is answering and change the indexes as the queries change—without affecting performance.
Challenges complying with changing data requirements
Compliance regulations are complex and multijurisdictional, and they change constantly. Organizations need to be able to easily review their data and identify anything that falls under new or modified requirements. In particular, personally identifiable information (PII) must be detected, tracked, and monitored for compliance with increasingly strict global privacy regulations.
Need to easily process and convert data
Collecting and identifying the data itself doesn’t provide any value—the organization needs to process it. If it takes a lot of time and effort to convert the data into what they need for analysis, that analysis won’t happen. As a result, the potential value of that data is lost.
Constant need to store data effectively
In the new world of data management, organizations store data in multiple systems, including data warehouses and unstructured data lakes that store any data in any format in a single repository. An organization’s data scientists need a way to quickly and easily transform data from its original format into the shape, format, or model they need it to be in for a wide array of analyses.
Demand to continually optimize IT agility and costs
With the availability of cloud data management systems, organizations can now choose whether keep and analyze data in on-premises environments, in the cloud, or in a hybrid mixture of the two. IT organizations need to evaluate the level of identicality between on-premises and cloud environments in order to maintain maximum IT agility and lower costs.
Data Management Principles and Data Privacy
The General Data Protection Regulation (GDPR) enacted by the European Union and implemented in May 2018 includes seven key principles for the management and processing of personal data. These principles include lawfulness, fairness, and transparency; purpose limitation; accuracy; storage limitation; integrity and confidentiality; and more.
The GDPR and other laws that follow in its footsteps, such as the California Consumer Privacy Act (CCPA), are changing the face of data management. These requirements provide standardized data protection laws that give individuals control over their personal data and how it is used. In effect, it turns consumers into data stakeholders with real legal recourse when organizations fail to obtain informed consent at data capture, exercise poor control over data use or locality, or fail to comply with data erasure or portability requirements.
Data Management Best Practices
Addressing data management challenges requires a comprehensive, well-thought-out set of best practices. Although specific best practices vary depending on the type of data involved and the industry, the following best practices address the major data management challenges organizations face today:
Create a discovery layer to identify your data
A discovery layer on top of your organization’s data tier allows analysts and data scientists to search and browse for datasets to make your data useable.
Develop a data science environment to efficiently repurpose your data
A data science environment automates as much of the data transformation work as possible, streamlining the creation and evaluation of data models. A set of tools that eliminates the need for the manual transformation of data can expedite the hypothesizing and testing of new models.
Autonomous data capabilities use AI and machine learning to continuously monitor database queries and optimize indexes as the queries change. This allows the database to maintain rapid response times and frees DBAs and data scientists from time-consuming manual tasks.
Use discovery to stay on top of compliance requirements
New tools use data discovery to review data and identify the chains of connection that need to be detected, tracked, and monitored for multijurisdictional compliance. As compliance demands increase globally, this capability is going to be increasingly important to risk and security officers.
Ensure you’re using a converged database
A converged database is a database that has native support for all modern data types and the latest development models built into one product. The best converged databases can run many kinds of workloads, including graph, IoT, blockchain, and machine learning.
Ensure your database platform has the performance, scale, and availability to support your business
The goal of bringing data together is to be able to analyze it to make better, more timely decisions. A scalable, high-performance database platform allows enterprises to rapidly analyze data from multiple sources using advanced analytics and machine learning so they can make better business decisions.
Use a common query layer to manage multiple and diverse forms of data storage
New technologies are enabling data management repositories to work together, making the differences between them disappear. A common query layer that spans the many kinds of data storage enables data scientists, analysts, and applications to access data without needing to know where it is stored and without needing to manually transform it into a usable format.
The Value of a Data Science Environment
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract value from data. Data scientists combine a range of skills—including statistics, computer science, and business knowledge—to analyze data collected from the web, smartphones, customers, sensors, and other sources.
With data’s new role as business capital, organizations are discovering what digital startups and disruptors already know: Data is a valuable asset for identifying trends, making decisions, and taking action before competitors. The new position of data in the value chain is leading organizations to actively seek better ways to derive value from this new capital.