| |
Analyzing Business Information
Module Objectives
Purpose
This module focuses on how to use the Oracle9i
analytic functions.
Objectives
After completing this module, you should be able to:
Prerequisites
Before starting this module, you should have completed the following:
Reference Material
The following is useful reference material if you want additional information
about the topics in this module:
 |
Documentation: Oracle9i
Data Warehousing Guide
|
Overview
Oracle9i adds
several new types of analytic functions. The analytic functions enhance
both database performance and developer productivity. They are valuable
for all types of processing, from interactive decision support to batch
report jobs. Corporate developers and independent software vendors alike
can take advantage of the features.
Here are the key benefits provided by the new functions:

Using Inverse Percentile Functions
Two new Oracle9i
functions, PERCENTILE_CONT
and PERCENTILE_DISC, compute
inverse percentiles. You can specify the percentile value you want, and
the functions return the row from your data set which is closest to that
percentile (PERCENTILE_DISC)
or the calculated value closest to that percentile (PERCENTILE_CONT).
These functions require a sort specification and a percentile parameter
value between 0 and 1.
The functions can be used either as aggregate functions or as reporting
aggregate functions. When used as aggregate functions, they return a single
value per ordered set. When used as reporting aggregate functions, they
return the data from each row. The functions use a new WITHIN
GROUP clause to specify the data ordering.
Sample Query Using the PERCENTILE_DISC
Function
In the Sales History (SH)
schema, the cost of goods is tracked through the channels of distribution
for catalog, direct sales, internet, partner, and telephone sales. To
find the discrete value closest to the 50th percentile of costs per channel
of distribution for the month of November, 1999, use the PERCENTILE_DISC
function. To find the median value of the cost data, specify that the
data should be sorted by cost, and then specify a percentile value of
0.5.
To run this query, perform the following steps:
| 1. |
From a SQL*Plus session connected to the SH
schema, enter @query_with_percentile_disc.sql to run
the following script:

|
| 2. |
The result from the query shows the discrete values closest to
the 50th percentile (that is, the median values) for costs per channel
of distribution in November, 1999. This information can be useful
for cost projections in future Novembers. The average is also shown
in this example, to highlight the difference between an average
and a median.

|

Using Hypothetical Rank and Distribution Functions
In certain kinds of analysis, such as financial planning, you may want
to know how a data value would rank if it were added to your data set.
For instance, you want to introduce a new piece of women's clothing and
you want to know what rank the planned price would have compared to your
other prices.
The hypothetical rank and distribution functions support this type of
what-if analysis: they return the rank or percentile value which a row
would be assigned if it were inserted into a set of other rows. There
are the following hypothetical functions: RANK,
DENSE_RANK, PERCENT_RANK,
and CUME_DIST. Like the
inverse percentile functions, the hypothetical rank and distributions
functions use a WITHIN GROUP
clause containing an ORDER BY
specification.
Sample Query Using the RANK
and DENSE_RANK Functions
In the Sales History (SH)
schema, you want to find the rank for hypothetical a piece of women's
clothing with a price of $50.
To run this query, perform the following steps:
| 1. |
From a SQL*Plus session connected to the SH
schema, enter @query_with_rank.sql to run the following
script:

|
| 2. |
The result from the query shows the rank of
a dress with a list price of $50 within each of the subcategories
of women's clothing. The large differences between the RANK
and DR columns is due
to the fact that many dresses have the same list price, therefore
women's clothing calculated with the DENSE_RANK
function has very few rank values compared to the regular RANK
function.

|
Sample Query Using the CUME_DIST
Function
In this example, you want to find out what percentile a $100 product
would have in each category.
To run this query, perform the following:
| 1. |
From a SQL*Plus session connected to the SH
schema, enter @query_with_cume_dist.sql to run the following
script:

|
| 2. |
The result from the query shows the fraction
of products in the subcategory that have higher list prices.
For instance, 24% of the products in the Shoes - Women subcategory
have a list price higher than $100.

|

Using FIRST and LAST
Aggregate Functions
The FIRST and LAST
aggregate functions allow you to specify an order within aggregated groups
and then return the first or last value in the group for a column other
than the sorted column. Although an equivalent query can be created using
joins or subqueries, the SQL syntax is cumbersome in that case and performance
can be inefficient. The FIRST
and LAST functions do the
same work with a simpler SQL syntax and better performance.
Sample Query Using the FIRST
and LAST Aggregate Functions
In the Sales History (SH)
schema, you want to run a query that determines, by product, the dollar
amount of the sale which sold the lowest number of items, and the dollar
amount of the sale which sold the highest number of items.
To run this query, perform the following steps:
| 1. |
From a SQL*Plus session connected to the SH
schema, enter @query_with_first_last.sql to run the following
script:

|
| 2. |
The result of the query shows, within each channel, the quantity
of items sold in the transactions that involved the highest and lowest
dollar amounts on October 10, 2000.

|
Using Grouping Sets
A grouping set is set of aggregate groups specified in a GROUP
BY clause. As powerful as the CUBE
and ROLLUP extensions are,
there are additional abilities that would enhance them significantly.
It would be very helpful to be able to specify all of the following in
a single GROUP BY clause:
 |
Grouping sets to perform multiple independent groupings
|
 |
Composite columns to skip some unneeded aggregation levels
|
 |
Concatenated groupings to concisely specify the groupings
of interest by generating combinations automatically |
Without the enhancements in Oracle9i,
you must use multiple queries and combine them with the UNION
ALL clause in order to achieve these tasks. Using multiple queries
is inefficient, however, because it requires multiple scans of the same
data. The extensions to the GROUP
BY clause in Oracle9i
allow the optimizer to choose better plans, thus enabling the SQL execution
engine to execute the query very efficiently. Users can analyze data in
one dimension without completely rolling it up, analyze across multiple
dimensions without computing the whole CUBE,
and specify multiple arbitrary groupings to meet any need. These features
are extremely useful in data warehousing, which frequently involves extensive
aggregate calculations.
Queries Using the GROUPING SETS
Function
In the example shown, the data is grouped into three grouping sets:
 |
channel_desc, calendar_month_desc,
country_id
|
 |
channel_desc, country_id
|
 |
calendar_month_desc,
country_id |
Note: For simplicity, only data for the first two days of December,
1999, and for the products 10, 20, and 45 are selected.
To run this query, perform the following steps:
| 1. |
From a SQL*Plus session connected to the SH schema, enter
query_with_grouping_sets.sql
to run the following script:

|
| 2. |
The GROUPING SET
statement shown previously identifies the exact sets wanted:

|
Queries Using Composite Columns
A composite column is a collection of columns that is treated as a unit
during the computation of groupings. You specify the columns in parenthesis.
Composite columns allow you to skip unwanted levels of rollup.
To run this query, perform the following steps:
| 1. |
From a SQL*Plus session connected to the SH schema, enter
query_with_composite_column.sql
to run the following script:

|
| 2. |
Because of the (channel_id,
prod_id) composite
column, you never see values
for the grouping of time_id
and channel_id.

|

Using the WIDTH_BUCKET
Function
The WIDTH_BUCKET function
divides a numeric range into a specified number of subranges called "buckets."
For any given expression, the WIDTH_BUCKET
function returns that expression's bucket number.
You can also generate equiwidth histograms with this function. Equiwidth
histograms divide data sets into buckets of equal size. You provide the
input expression, the minimum boundary value, the maximum boundary value,
and the number of buckets. However, if you specify n buckets, you
actually get n + 2 buckets: n regular ones and two artificial
ones to catch values outside the boundary range: bucket 0 holds the values
less than the minimum boundary value and bucket n + 1 holds the
values greater than the maximum boundary value.
Sample Query Using the WIDTH_BUCKET
Function
In this example, you want to analyze the product list prices between
$10 and $90, and determine where in that range the Navy
Blue Trousers and Jersey Trousers Kids are. To run such a query,
perform the following steps:
| 1. |
From a SQL*Plus session connected to the SH
schema, enter @query_with_width_bucket.sql to run the
following script:

Products with a list price less than $10 will be placed in bucket
0; products that have a list price greater than $90 will be placed
in bucket 8. The other products will be placed in buckets 1 through
7, depending on their price.
|
| 2. |
The result of the query shows that Navy Blue
Trousers are in bucket 2, and Jersey Trousers Kids are in bucket 0
because their price is less than $10.

|
Queries Using Concatenated Groupings
Concatenated groupings offer a concise way to generate useful combinations
of groupings. Groupings specified with concatenated groupings yield the
cross-product of groupings from each grouping set. The cross-product operation
enables even a small number of concatenated groupings to generate a large
number of final groups. The concatenated groupings are specified simply
by listing multiple grouping sets, cubes, or rollups, and separating them
with commas.
To run this query, perform the following steps:
| 1. |
From a SQL*Plus session connected to the SH schema, enter
query_with_concatenated_groupings.sql
to run the following script:

|
| 2. |
The results are four
grouping generated as the cross product of the two grouping sets:
country_id, calendar_year
country_id, calendar_quarter_desc
cust_state_province,
calendar_year
cust_state_province,
calendar_quarter_year

|
For information on using concatenated groupings to generate hierarchical
cubes for online analytic processing, please see the Data Warehouse
Guide.
Module Summary
In this module, you should have learned how to:
Copyright © 2002 Oracle Corporation. All Rights Reserved.
Close Window
|