Benefits of a Data Warehouse
Data warehouses offer the overarching and unique benefit of allowing organizations to analyze large amounts of variant data and extract significant value from it, as well as to keep a historical record.
Four unique characteristics (described by computer scientist William Inmon, who is considered the father of the data warehouse) allow data warehouses to deliver this overarching benefit. According to this definition, data warehouses are
- Subject-oriented. They can analyze data about a particular subject or functional area (such as sales).
- Integrated. Data warehouses create consistency among different data types from disparate sources.
- Nonvolatile. Once data is in a data warehouse, it’s stable and doesn’t change.
- Time-variant. Data warehouse analysis looks at change over time.
A well-designed data warehouse will perform queries very quickly, deliver high data throughput, and provide enough flexibility for end users to “slice and dice” or reduce the volume of data for closer examination to meet a variety of demands—whether at a high level or at a very fine, detailed level. The data warehouse serves as the functional foundation for middleware BI environments that provide end users with reports, dashboards, and other interfaces.
Data Warehouse Architecture
The architecture of a data warehouse is determined by the organization’s specific needs. Common architectures include
- Simple. All data warehouses share a basic design in which metadata, summary data, and raw data are stored within the central repository of the warehouse. The repository is fed by data sources on one end and accessed by end users for analysis, reporting, and mining on the other end.
- Simple with a staging area. Operational data must be cleaned and processed before being put in the warehouse. Although this can be done programmatically, many data warehouses add a staging area for data before it enters the warehouse, to simplify data preparation.
- Hub and spoke. Adding data marts between the central repository and end users allows an organization to customize its data warehouse to serve various lines of business. When the data is ready for use, it is moved to the appropriate data mart.
- Sandboxes. Sandboxes are private, secure, safe areas that allow companies to quickly and informally explore new datasets or ways of analyzing data without having to conform to or comply with the formal rules and protocol of the data warehouse.
The Evolution of Data Warehouses—From Data Analytics to AI and Machine Learning
When data warehouses first came onto the scene in the late 1980s, their purpose was to help data flow from operational systems into decision-support systems (DSSs). These early data warehouses required an enormous amount of redundancy. Most organizations had multiple DSS environments that served their various users. Although the DSS environments used much of the same data, the gathering, cleaning, and integration of the data was often replicated for each environment.
As data warehouses became more efficient, they evolved from information stores that supported traditional BI platforms into broad analytics infrastructures that support a wide variety of applications, such as operational analytics and performance management.
Data warehouse iterations have progressed over time to deliver incremental additional value to the enterprise.
|1||Transactional reporting||Provides relational information to create snapshots of business performance|
|2||Slice and dice, ad hoc query, BI tools||Expands capabilities for deeper insights and more robust analysis|
|3||Predicting future performance (data mining)||Develops visualizations and forward-looking business intelligence|
|4||Tactical analysis (spatial, statistics)||Offers “what-if” scenarios to inform practical decisions based on more comprehensive analysis|
|5||Stores many months or years of data||Stores data for only weeks or months|
Supporting each of these five steps has required an increasing variety of datasets. The last three steps in particular create the imperative for an even broader range of data and analytics capabilities.
Today, AI and machine learning are transforming almost every industry, service, and enterprise asset—and data warehouses are no exception. The expansion of big data and the application of new digital technologies are driving change in data warehouse requirements and capabilities.
The autonomous data warehouse is the latest step in this evolution, offering enterprises the ability to extract even greater value from their data while lowering costs and improving data warehouse reliability and performance.
Find out more about autonomous data warehouses and get started with your own autonomous data warehouse.
Data Warehouses, Data Marts, and Operation Data Stores
Though they perform similar roles, data warehouses are different from data marts and operation data stores (ODSs). A data mart performs the same functions as a data warehouse but within a much more limited scope—usually a single department or line of business. This makes data marts easier to establish than data warehouses. However, they tend to introduce inconsistency because it can be difficult to uniformly manage and control data across numerous data marts.
ODSs support only daily operations, so their view of historical data is very limited. Although they work very well as sources of current data and are often used as such by data warehouses, they do not support historically rich queries.
What is a Cloud Data Warehouse?
A cloud data warehouse uses the cloud to ingest and store data from disparate data sources.
The original data warehouses were built on on-premises servers. These on-premises data warehouses continue to have many advantages today. In some cases, they can offer improved governance, security, and speed. However, on-premises data warehouses are not as elastic and they require complex forecasting to determine how to scale the data warehouse for future needs. Managing these data warehouses can also be very complex.
On the other hand, some of the advantages of cloud data warehouses include:
- Elasticity, with separate compute and storage
- Scale-out abilities, to handle compute or storage requirements
- Ease of use
- Ease of management
- Cost savings
The best cloud data warehouses are fully managed and self-driving, ensuring that even beginners can create and use a data warehouse with only a few clicks. In addition, most cloud data warehouses follow a pay-as-you-go model, which brings added cost savings to customers.
The Cloud and the Data Warehouse
Data warehouses in the cloud offer the same characteristics and benefits of on-premises data warehouses but with the added benefits of cloud computing―such as flexibility, scalability, agility, security, and reduced costs. Cloud data warehouses allow enterprises to focus solely on extracting value from their data rather than having to build and manage the hardware and software infrastructure to support the data warehouse.
Do I Need a Data Lake?
Organizations use both data lakes and data warehouses for large volumes of data from various sources. The choice of when to use one or the other depends on what the organization intends to do with the data. The following describes how each is best used:
- Data lakes store an abundance of disparate, unfiltered data to be used later for a particular purpose. Data from line-of-business applications, mobile apps, social media, IoT devices, and more is captured as raw data in a data lake. The structure, integrity, selection, and format of the various datasets is derived at the time of analysis by the person doing the analysis. When organizations need low-cost storage for unformatted, unstructured data from multiple sources that they intend to use for some purpose in the future, a data lake might be the right choice.
- Data warehouses are specifically intended to analyze data. Analytical processing within a data warehouse is performed on data that has been readied for analysis—gathered, contextualized, and transformed—with the purpose of generating analysis-based insights. Data warehouses are also adept at handling large quantities of data from various sources. When organizations need advanced data analytics or analysis that draws on historical data from multiple sources across their enterprise, a data warehouse is likely the right choice.
Why Not Run Analytics Against Your OLTP Environment?
Data warehouses are relational environments that are used for data analysis, particularly of historical data. Organizations use data warehouses to discover patterns and relationships in their data that develop over time.
In contrast, transactional environments are used to process transactions on an ongoing basis and are commonly used for order entry and financial and retail transactions. They do not build on historical data; in fact, in OLTP environments, historical data is often archived or simply deleted to improve performance.
Data warehouses and OLTP systems differ significantly.
|Data Warehouse||OLTP System|
|Workload||Accommodates ad hoc queries and data analysis||Supports only predefined operations|
|Data modifications||Automatically updates on a regular basis||Updates by end users issuing individual statements|
|Schema design||Uses partially denormalized schemas to optimize performance||Uses fully normalized schemas to guarantee data consistency|
|Data scanning||Encompasses thousands to millions of rows||Accesses only a handful of records at a time|
|Historical data||Stores many months or years of data||Stores data for only weeks or months|
Zero-Complexity Deployment: The Autonomous Data Warehouse
The most recent iteration of the data warehouse is the autonomous data warehouse, which relies on AI and machine learning to eliminate manual tasks and simplify setup, deployment, and data management. An as-a-service autonomous data warehouse in the cloud requires no human-performed database administration, hardware configuration or management, or software installation.
Creating the data warehouse, backing up, patching and upgrading the database, and expanding or reducing the database are all performed automatically—with the same flexibility, scalability, agility, and reduced costs that cloud platforms offer. The autonomous data warehouse removes complexity, speeds deployment, and frees up resources so organizations can focus on activities that add value to the business.
Oracle Autonomous Data Warehouse
Oracle Autonomous Data Warehouse is an easy-to-use, fully autonomous data warehouse that scales elastically, delivers fast query performance, and requires no database administration. The setup for Oracle Autonomous Data Warehouse is very simple and fast.