| |
Implementing Relational Business Analysis with SQL
Module Objectives
Purpose
In this module, you will learn how to use Oracle9i
for relational business alanlysis.
Objectives
After completing this module, you should be able to:
 |
Explain Oracles powerful aggregation capabilities |
 |
Use new and improved functionality for business intelligence
uses |
 |
Use the new query subfactoring clause of Oracle9i |
Prerequisites
Before starting this lesson, you should have:
Reference Material
The following is a list of useful reference materials if you want additional
information about the topics in this module:
 |
Documentation: Data Warehousing Guide
|
Overview
The SQL language, while extremely capable in many areas, has provided
little support for business intelligence processing. Many aggregation
tasks and business intelligence calculations such as subtotals, moving
averages, rankings and lead/lag comparisons have been inefficient and
inconvenient. To handle these limitations, application developers and
DBAs have often needed to program outside of standard SQL or use SQL that
was slow and cumbersome. This situation has hurt developer productivity
and database performance.
The recent releases of Oracle have addressed this problem by significantly
enhancing SQL for business intelligence processing:
Oracle8i Release
1 added support for the CUBE and ROLLUP extensions to the SELECT statement’s
GROUP BY clause. These extensions enable more efficient and convenient
aggregations, a key part of data warehousing and business intelligence
processing.
Oracle8i Release 2 introduced a powerful new set of SQL analytic functions
to address essential business intelligence calculations. The analytic
functions provide enhanced performance and higher developer productivity
for many calculations. In addition, some of the functions have been incorporated
into the international ANSI SQL standard.
Oracle9i adds
several new families of analytic functions, plus further extensions to
the GROUP BY clause such as Concatenated Groupings and GROUPING SETS.
The new GROUP BY extensions add flexibility to aggregate processing and
are integrated with Materialized Views for enhanced performance. The query
subfactoring clause is another new feature enhancing query performance.
Now you will walk through this functionality in the following sections.
This module can only give an introduction into Oracle’s powerful and
rich functionality for relational business analysis. Please consult the
‘Oracle Data Warehousing Guide’ for any further details.
Generic Aggregation Capabilities Cube and Rollup
Aggregation is a fundamental part of data warehousing. To improve aggregation
performance in your warehouse,
Oracle provides the following extensions to the GROUP BY clause:
Loading data using external tables
To show you how external tables can be created and used, perform the
following steps:
 |
CUBE and ROLLUP Extensions to the GROUP BY Clause
|
 |
The Three GROUPING Functions
|
 |
GROUPING SETS Expression |
The CUBE, ROLLUP, and GROUPING SETS extensions to SQL make querying and
reporting easier and faster.ROLLUP calculates aggregations such as SUM,
COUNT, MAX, MIN, and AVG at increasing levels of aggregation, from the
most detailed up to a grand total. CUBE is an extension similar to ROLLUP,
enabling a single statement to calculate all possible combinations of
aggregations. CUBE can generate the information needed in cross-tabulation
reports with a single query.
CUBE, ROLLUP, and the GROUPING SETS extension let you specify exactly
the groupings of interest in the GROUP BY clause. This allows efficient
analysis across multiple dimensions without performing a CUBE operation.
Computing a full cube creates a heavy processing load, so replacing cubes
with grouping sets can significantly increase performance. CUBE, ROLLUP,
and grouping sets produce a single result set that is equivalent to a
UNION ALL of differently grouped rows.
To enhance performance, CUBE, ROLLUP, and GROUPING SETS can be parallelized:
multiple processes can simultaneously execute all of these statements.
These capabilities make aggregate calculations more efficient,thereby
enhancing database performance, and scalability.
The three GROUPING functions help you identify the group each row belongs
to and enable sorting subtotal rows and filtering results.
Steps
The following steps will be performed to show you Oracle’s aggregation
capabilities:
| 1. |
Get a basic understanding of aggregation CUBE and ROLLUP
|
| 2. |
Use the GROUPING SET expression from Oracle9i
|
BASIC UNDERSTANDING OF CUBE AND ROLLUP
Prior to the introduction of the aggregation functions CUBE and ROLLUP,
there was no possibility to express an aggregation over different levels
within one SQL statement without the usage of the set operation UNION
ALL. Every different aggregation level needed its own SQL aggregation
expression, operating on the exact same data set n times, once for each
of the n different aggregation levels.
With the introduction of CUBE and ROLLUP in Oracle8i, Oracle provided
a single SQL command for handling the aggregation over different levels
within one single SQL statement, not only improving the runtime of this
operation but also reducing the amount of internal operations necessary
and reducing the workload on the system.
ROLLUP enables a SELECT statement to calculate
multiple levels of subtotals across a specified group of dimensions. It
also calculates a grand total. ROLLUP is a simple extension to the GROUP
BY clause, so its syntax is extremely easy to use. The ROLLUP extension
is highly efficient, adding minimal overhead to a query.
CUBE takes a specified set of grouping
columns and creates subtotals for all of their possible combinations.
In terms of multidimensional analysis, CUBE generates all the subtotals
that could be calculated for a data cube with the specified dimensions.
We will demonstrate the advanced access path of those operations with
a ROLLUP example.
ROLLUP - THE ‘OLD’ WAY
| 1. |
From a SQL*Plus session logged on to the SH schema, run explain_old_rollup.sql,
or copy the following SQL statement into your SQL*Plus session:
@explain_old_rollup.sql
Rem old way ROLLUP with UNION ALL
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT t.calendar_month_desc month,
p.prod_category category,
SUM(s.amount_sold) total_sales
FROM sales s, products p, times t, customers c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.cust_id = c.cust_id
AND t.calendar_month_desc in ('2000-01', '2000-02')
AND c.cust_state_province in ('CA', 'NV')
GROUP BY t.calendar_month_desc, p.prod_category
UNION ALL
SELECT t.calendar_month_desc month,
NULL category,
SUM(s.amount_sold) total_sales
FROM sales s, products p, times t, customers c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.cust_id = c.cust_id
AND t.calendar_month_desc in ('2000-01', '2000-02')
AND c.cust_state_province in ('CA', 'NV')
GROUP BY t.calendar_month_desc
UNION ALL
SELECT NULL month,
NULL category,
SUM(s.amount_sold) total_sales
FROM sales s, products p, times t, customers c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.cust_id = c.cust_id
AND t.calendar_month_desc in ('2000-01', '2000-02')
AND c.cust_state_province in ('CA', 'NV');
set linesize 132
set pagesize 999
select * from table(dbms_xplan.display);
The following plan shows, how Oracle executes this operation. The
most important fact is that Oracle has to access the large fact
table sales THREE times, once per UNION ALL branch, and that the
subsequent result sets finally merged.
Selecting additional aggregation levels would lead to additional
SQL statements in the UNION ALL operation, causing additional access
of the large fact table SALES.

To execute the query, you can run old_rollup.sql.

|
ROLLUP - THE NEW WAY
| 2. |
The same business query, expressed with Oracle’s ROLLUP functionality
leads to a different execution plan:
@explain_new_rollup.sql
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT t.calendar_month_desc month,
p.prod_category category,
SUM(s.amount_sold) total_sales
FROM sales s, products p, times t, customers c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.cust_id = c.cust_id
AND t.calendar_month_desc in ('2000-01', '2000-02')
AND c.cust_state_province in ('CA', 'NV')
GROUP BY ROLLUP(t.calendar_month_desc, p.prod_category);
set linesize 132
set pagesize 999
select * from table(dbms_xplan.display);

Unlike the plan shown before, the new ROLLUP functionality scans
the large fact table SALES only once and computes the different
aggregations as part of the SORT-GROUP BY. This leads to tremendous
performance improvements, especially when the data volume becomes
larger and the requested number of aggregation levels increases.
To execute the query with the new ROLLUP functionality, you can
run new_rollup.sql.
Even with our small test data volume, you will experience a significant
performance improvement.

|
ORACLE 9I’S NEW GROUPING SETS EXPRESSION
You can selectively specify the set of groups that you want to create
using a GROUPING SETS expression within a GROUP BY clause. This allows
precise specification across multiple dimensions without computing the
whole CUBE.
The following example precisely specifies 3 grouping sets. Without the
use of grouping sets, the query would need to be done as either:
- a UNION of 3 single group queries, or
- a group by query using CUBE, which would need
to be filtered to remove unwanted rows
Either of those approaches would run slower and use more resources.
| 1. |
From a SQL*Plus session logged on to the SH schema, run run_gs1.sql,
or copy the following SQL statement into your SQL*Plus session:
@run_gs1.sql
SELECT channel_desc, calendar_month_desc, country_id,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels
WHERE sales.time_id=times.time_id
AND sales.cust_id=customers.cust_id
AND sales.channel_id=channels.channel_id
AND channels.channel_desc IN ('Direct Sales','Internet')
AND times.calendar_month_desc IN ('2000-09', '2000-10')
AND country_id IN ('UK', 'US')
GROUP BY GROUPING SETS((channel_desc,calendar_month_desc,country_id),
(channel_desc, country_id),(calendar_month_desc, country_id))
;

Try to rewrite the above shown SQL statement as UNION ALL and
compare the execution plans. You can compare your result with run_old_gs1.sql.
The plans are shown with explain_gs1.sql, or explain_old_gs1.sql
respectively.
|
GROUPING SETS WITH COMPOSITE COLUMNS
Composite columns are columns grouped together with parentheses. In CUBE
and ROLLUP statements, composite columns are treated as a single unit.
This reduces the number of groups generated by the query, which improves
performance when we do not need all the groups that would otherwise be
generated. It also reduces the number of rows that need to be filtered
from the results.
In the example below, the composite column of (calendar_month_desc, country_id)
in ROLLUP prevents the query from returning the grouping: (channel_desc,
calendar_month_desc).
The only groupings returned are:
- (channel_desc, calendar_month_desc, country_id)
- channel_desc)
- ( )
| 1. |
From a SQL*Plus session logged on to the SH schema, run run_gs2.sql,
or copy the following SQL statement into your SQL*Plus session:
@run_gs2.sql
SELECT channel_desc, calendar_month_desc, country_id,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels
WHERE sales.time_id=times.time_id
AND sales.cust_id=customers.cust_id
AND sales.channel_id=channels.channel_id
AND channels.channel_desc IN ('Direct Sales','Internet')
AND times.calendar_month_desc IN ('2000-09','2000-10')
AND country_id IN ('UK', 'US')
GROUP BY ROLLUP(channel_desc, (calendar_month_desc, country_id));

Compare the output with the one from the preceding SQL statement,
and you’ll see the suppression of the grouping (channel_desc, calendar_month_desc).
|
CONCATENATED GROUPING SETS
Grouping sets are further enhanced with the concept of 'concatenated
grouping sets'. Concatenated grouping sets offer a concise way to generate
large and complex combinations of groupings. Groupings specified with
concatenated grouping sets yield the cross-product of groupings from each
grouping set. The cross-product operation enables even a small number
of concatenated groupings to generate a large number of final groups.
The syntax for concatenated grouping sets is very easy: just list the
grouping elements separated by commas.
The following example creates a cross product using the four columns
specified in the two GROUPING SETS. Therefore it creates the following
4 groups:
- (country_id, calendar_year)
- (country_id, calendar_quarter_desc)
- (cust_state_province, calendar_year)
- (cust_state_province, calendar_quarter_desc)
| 1. |
From a SQL*Plus session logged on to the SH schema, run run_gs3.sql,
or copy the following SQL statement into your SQL*Plus session:
@run_gs3.sql
SELECT country_id, cust_state_province,
calendar_year, calendar_quarter_desc,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels
WHERE sales.time_id=times.time_id
AND sales.cust_id=customers.cust_id
AND sales.channel_id= channels.channel_id
AND channels.channel_desc IN ('Direct Sales', 'Internet')
AND times.calendar_month_desc IN ('2000-09', '2000-10')
AND country_id IN ('UK', 'US')
GROUP BY GROUPING SETS (country_id, cust_state_province),
GROUPING SETS (calendar_year, calendar_quarter_desc);

|
CONCATENATED GROUPING SETS - HIERARCHIC CUBE EXAMPLE
The following example illustrates a powerful technique for generating
all the groupings needed to create a hierarchical data cube, such as those
used for OLAP queries. The SQL generates the ROLLUP of aggregations for
each dimension (time, geography, product) of a fact table, and it creates
the cross product of groups by combining the ROLLUP groups. This techniques
enables a small, easy-to-read SQL statement to generate a very large number
of useful groups.
The query below generates the following number of ROLLUP groups:
- 4 for the time dimension
- 6 for the geography dimension
- 4 for the product dimension
When these groups are multiplied for the cross product, we see that the
query will create 4*6*4 = 96 groups. Oracle9i
has been optimized to process hierarchical cubes specified with concatenated
grouping sets efficiently.
| 1. |
From a SQL*Plus session logged on to the SH schema, run run_gs4.sql,
or copy the following SQL statement into your SQL*Plus session:
@run_gs4.sql
SELECT calendar_year, calendar_quarter_desc,
calendar_month_desc, country_region,
country_subregion, countries.country_id,
cust_state_province, cust_city,
prod_cat_desc, prod_subcat_desc, prod_name,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels, countries, products
WHERE sales.time_id=times.time_id
AND sales.cust_id=customers.cust_id
AND sales.channel_id= channels.channel_id
AND sales.prod_id=products.prod_id
AND customers.country_id=countries.country_id
AND channels.channel_desc IN ('Direct Sales', 'Internet')
AND times.calendar_month_desc IN ('2000-09', '2000-10')
AND prod_name IN ('Vintage Linen Shirt', 'Merrel Bangle Moc')
AND countries.country_id IN ('UK', 'US')
GROUP BY ROLLUP(calendar_year, calendar_quarter_desc,
calendar_month_desc), ROLLUP(country_region,
country_subregion, countries.country_id,
cust_state_province, cust_city),
ROLLUP(prod_cat_desc, prod_subcat_desc, prod_name);

Note that this will return a large number of rows. You can interrupt
the output with CTRL-C.
|
THE USAGE OF GROUPING_ID AND GROUP_ID
GROUPING_ID and GROUP_ID are 2 new functions in Oracle9i which help distinguish
among groups returned by a query. Since a single query may return a large
number of groups (consider the hierarchical cube query above) it is important
to be able to distinguish among the groups efficiently.
GROUPING_ID creates a bit vector value that distinguishes among the groups
specified in its arguments. If you specify 2 columns as arguments, GROUPING_ID
will return a value of 0,1,2,or 3, as shown in the table below.
|
COL1
|
COL2
|
GROUPING_ID (COL1,COL2)
|
|
Detail
|
Detail
|
0
|
|
Detail
|
Aggregated
|
1
|
|
Aggregated
|
Detail
|
2
|
|
Aggregated
|
Aggregated
|
3
|
GROUP_ID shows you if a group is returned multiple times by a query,
allowing you to filter out the extra occurrences. The query below returns
2 sets of the group (country_id). In typical situations, we would wish
to filter out the redundant set of the (country_id) group. This would
be easy with the GROUP_ID function, because it will give the second (country_id)
group a value of 1.
| 1. |
From a SQL*Plus session logged on to the SH schema, run run_gs5.sql,
or copy the following SQL statement into your SQL*Plus session:
@run_gs5.sql
SELECT country_id, cust_state_province, SUM(amount_sold),
GROUPING_ID(country_id, cust_state_province)
GROUPING_ID, GROUP_ID()
FROM sales, customers, times
WHERE sales.time_id=times.time_id
AND sales.cust_id=customers.cust_id
AND times.time_id= '30-OCT-00'
AND country_id IN ('FR', 'ES')
GROUP BY GROUPING SETS(country_id,
ROLLUP(country_id, cust_state_province));

|
USE NEW AND IMPROVED FUNCTIONALITY FOR BUSINESS INTELLIGENCE QUERIES
Oracle's analytic functions allow the RDBMS to directly resolve many
OLAP queries closest to the data where the best performance can be achieved.
Without these functions, many important calculations would require complex
and inefficient SQL statements. The functions are particularly valuable
for avoiding inefficient selfjoins and views. Below we present the analytic
functions added in Oracle9i.
For details on the functions added in Oracle 8i release 2, please
see the ‘Oracle Data Warehousing Guide’.
INVERSE PERCENTILE
The PERCENTILE_DISC function finds the stored value which is closest
to the specified percentile. The PERCENTILE_CONT function uses linear
interpolation to calculate a value matching the specification. Inverse
percentile functions are very useful for finding cutoff points and target
levels. For instance, you can use them to answer questions such as "What
is the sales level achieved by my top 15% of sales people?"
| 1. |
This example finds the median value (50th percentile), both as
stored data and as calculated number. From a SQL*Plus session logged
on to the SH schema, run run_an1.sql,
or copy the following SQL statement into your SQL*Plus session:
@run_an1.sql
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP
(ORDER BY cust_credit_limit) AS perc_disc,
PERCENTILE_CONT(0.5) WITHIN GROUP
(ORDER BY cust_credit_limit) AS perc_cont
FROM customers
WHERE cust_city='Marshal';

|
WIDTH_BUCKET
Histograms are a powerful and popular analytic tool. To create histograms,
we must first categorize our values into numeric ranges called buckets.
Once the rows in our data set have been assigned to buckets, it is easy
to generate histograms by counting the number of rows for each bucket.
The WIDTH_BUCKET function performs the first step in this process. It
lets you specify "n" equi-width buckets to be used, and the top and bottom
of the numeric range. Since some numbers may be outside the specified
range, there is an underflow bucket numbered 0 and an overflow bucket
numbered n+1.
The example below shows customers divided into 4 buckets on the basis
of their credit limits. It also shows how the ordering of the buckets
can be determined by placing the low boundary first (WIDTH_BUCKET_UP)
or last (WIDTH_BUCKET_DOWN).
| 1. |
From a SQL*Plus session logged on to the SH schema, run run_an4.sql,
or copy the following SQL statement into your SQL*Plus session:
@run_an4.sql
SELECT cust_id, cust_credit_limit,
WIDTH_BUCKET(cust_credit_limit,0,20000,4)
AS WIDTH_BUCKET_UP,
WIDTH_BUCKET(cust_credit_limit,20000, 0, 4)
AS WIDTH_BUCKET_DOWN
FROM customers WHERE cust_city = 'Marshal';

|
HYPOTHETICAL RANK AND DISTRIBUTION FUNCTIONS
Many business planning tasks benefit from knowing how a proposed item
would rank compared to existing items. Oracle9i has added syntax for its
ranking and distribution functions that permits this kind of query.
This example shows the RANK, PERCENT_RANK, and CUME_DIST that a hypothetical
sweater with a list price of $50 would have compared to other sweaters
within each of the sweater subcategories.
| 1. |
From a SQL*Plus session logged on to the SH schema, run run_an2.sql,
or copy the following SQL statement into your SQL*Plus session:
@run_an2.sql
SELECT prod_subcategory,
RANK(50) WITHIN GROUP
(ORDER BY prod_list_price DESC) as HRANK,
TO_CHAR(PERCENT_RANK(50) WITHIN GROUP
(ORDER BY prod_list_price),'9.999') AS HPERC_RANK,
TO_CHAR(CUME_DIST (50) WITHIN GROUP
(ORDER BY prod_list_price),'9.999') AS HCUME_DIST
FROM products
WHERE prod_subcategory LIKE 'Sweater%'
GROUP BY prod_subcategory;

|
FIRST/LAST
The FIRST and LAST functions allow you to order groups by specified columns
and then return the first or last value of another column. If a tie value
is encountered, then a tiebreaker function will be used to resolve the
tie.
To achieve the same result without this function is much less efficient.
In the example below, the data is grouped by channel_desc and ordered
within each group by $amount_sold. The column titled QUANT_FOR_LEAST_$_SALES
returns the unit quantity sold for the row that had the least $ amount.
This is the "FIRST" row in the ordered group. If a tie value is found,
the column will show the MIN value.
The column titled QUANT_FOR_MOST_$_SALES returns the unit quantity sold
in the row that had the most $ amount. This is the "LAST" row in the ordered
group. If a tie value is found, the column will show the MAX value.
| 1. |
From a SQL*Plus session logged on to the SH schema, run run_an3.sql,
or copy the following SQL statement into your SQL*Plus session:
@run_an3.sql
SELECT ch.channel_desc,
MIN((quantity_sold)) KEEP
(DENSE_RANK FIRST ORDER BY (s.amount_sold) )
as QUANT_FOR_LEAST_$_SALES,
MAX((quantity_sold)) KEEP
(DENSE_RANK LAST ORDER BY (s.amount_sold) )
as QUANT_FOR_MOST_$_SALES
FROM sales s , channels ch
WHERE s.time_id=to_DATE('10-OCT-2000')
AND s.channel_id=ch.channel_id
GROUP BY ch.channel_desc;

|
USE NEW QUERY SUBFACTORING CLAUSE IN ORACLE9i
The query subfactoring clause allows a query block that is used multiple
times in a query to be calculated just once and then reused. This is extremely
valuable for query blocks that require a large amount of processing.
The example below defines channel_summary at the start and then uses
it twice in the body of the query.
| 1. |
Prior to Oracle9i, you had to code the recurring query block for
every occurrence. There’s no optimization for the recurring query
block.
From a SQL*Plus session logged on to the SH schema, run run_qs_old.sql,
or copy the following SQL statement into your SQL*Plus session:
@run_qs_old.sql
SELECT channels.channel_desc, SUM(amount_sold) channel_total
FROM sales, channels
WHERE sales.channel_id = channels.channel_id
GROUP BY channels.channel_desc
HAVING SUM(amount_sold) >
( SELECT SUM(amount_sold) * 1/3
FROM sales, channels
WHERE sales.channel_id = channels.channel_id);

This is an example output of the execution plan of this query (explain_qs_old.sql).
The large fact table sales is accessed twice as part of the query;
once for the inner SELECT in the HAVING clause, and a second time
for getting the SELECT part of the query.
|
| 2. |
From a SQL*Plus session logged on to the SH schema, run run_qs_new.sql,
or copy the following SQL statement into your SQL*Plus session:
@run_qs_new.sql
WITH channel_summary AS
( SELECT channels.channel_desc, SUM(amount_sold) AS channel_total
FROM sales, channels
WHERE sales.channel_id = channels.channel_id
GROUP BY channels.channel_desc
)
SELECT channel_desc, channel_total
FROM channel_summary
WHERE channel_total >
( SELECT SUM(channel_total) * 1/3
FROM channel_summary)
;

This is an example output of the execution plan of this query (explain_qs_new.sql).
The fact table sales is touched only once. Oracle first executes
an internal recursive part to build the result set for the query
block described as channel_summary in the named WITH part of the
query as temporary table for this specific query. You can see the
actual insert statement in the OTHER column of explain plan.
This temporary table, representing the query block which is used
multiple times, is then used in the statement, shown in the lower
part of the plan, ending with the FILTER operation.
|
Module Summary
In this module, you should have learned how to:
 |
Explain Oracles powerful aggregation capabilities |
 |
Use new and improved functionality for business intelligence
uses |
 |
Use the new query subfactoring clause of Oracle9i |
Copyright © 2002 Oracle Corporation. All Rights Reserved.
Close Window
|