Using Partitioned Outer Join to Fill Gaps in Sparse Data
Using Partitioned Out Join to Fill Gaps in Sparse Data
Using Partitioned Outer Join to Fill Gaps in Sparse Data
In this tutorial you learn how to use the new SQL join syntax
in the Oracle Database 10g SQL to fill gaps in sparse data.
Approximately 30 minutes
This tutorial covers the following topics:
Place the cursor over this icon to load and view all the
screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously,
so response time may be slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.
Oracle Database 10g
Partitioned Outer Join Clause Overview
Data is usually stored in sparse form. That is, if no value
exists for a given time, no row exists in the fact table. However, time series
calculations can be performed most easily when data is dense along the time
dimension. This is because dense data fills a consistent number of rows
for each period, which in turn makes it simple to use the analytic windowing
functions with physical offsets. Refer to Chapter 21: Data Warehousing Guide
for more information.
To overcome the problem of sparsity, you can use a partitioned
outer join to fill the gaps in a time series. Such a join extends the conventional
outer join syntax by applying the outer join to each logical partition defined
in a query. The Oracle database logically partitions the rows in your query
based on the expression you specify in the PARTITION
BY clause. The result of a partitioned outer join is a UNION
of the outer joins of each of the groups in the logically partitioned table
with the table on the other side of the join.
Note that you can use this type of join to fill the gaps in
any dimension, not just the time dimension. In this tutorial, you will focus
on the time dimension because it is the dimension most frequently used as a
basis for comparisons.
Back to Topic List
Before starting this tutorial, you should:
| 1. |
Perform the
Installing Oracle Database 10g on Windows tutorial.
|
| 2. |
Download and unzip outer_j.zip
into your working directory (i.e. c:\wkdir)
|
Back to Topic List
The syntax for partitioned outer join extends the
ANSI SQL JOIN clause with the phrase PARTITION
BY followed by an expression list. The expressions in the list specify
the group to which the outer join is applied. Following are the two forms of
syntax typically used for partitioned outer join:
SELECT select_expression
FROM table_reference
PARTITION BY (expr [, expr ]... )
RIGHT OUTER JOIN table_reference
|
SELECT select_expression
FROM table_reference
LEFT OUTER JOIN table_reference
PARTITION BY {expr [,expr ]...)
|
Note that FULL OUTER
JOIN is not supported with a partitioned outer join. Refer to the Oracle
Database 10g SQL Reference for further information regarding syntax and
restrictions.
Back to Topic List
A typical situation with a sparse dimension is shown in the
following example, which computes the weekly sales and year-to-date sales for
the product "Bounce" for weeks 2030 in 2000 and 2001:
SELECT SUBSTR(p.Prod_Name,1,15) Product_Name, t.Calendar_Year Year, t.Calendar_Week_Number Week, SUM(Amount_Sold) Sales FROM Sales s, Times t, Products p WHERE s.Time_id = t.Time_id AND s.Prod_id = p.Prod_id AND p.Prod_name IN ('Bounce') AND t.Calendar_Year IN (2000,2001) AND t.Calendar_Week_Number BETWEEN 20 AND 30 GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number;
|
PRODUCT_NAME YEAR WEEK SALES --------------- ---------- ---------- ---------- Bounce 2000 20 801 Bounce 2000 21 4062.24 Bounce 2000 22 2043.16 Bounce 2000 23 2731.14 Bounce 2000 24 4419.36 Bounce 2000 27 2297.29 Bounce 2000 28 1443.13 Bounce 2000 29 1927.38 Bounce 2000 30 1927.38 Bounce 2001 20 1483.3 Bounce 2001 21 4184.49 Bounce 2001 22 2609.19 Bounce 2001 23 1416.95 Bounce 2001 24 3149.62 Bounce 2001 25 2645.98 Bounce 2001 27 2125.12 Bounce 2001 29 2467.92 Bounce 2001 30 2620.17
18 rows selected.
|
In this example, you would expect 22 rows of data (11 weeks
each from 2 years) if the data were dense. However you see only 18 rows because
weeks 25 and 26 are missing in 2000, and weeks 26 and 28 are missing in 2001.
Back to Topic List
Gaps in time series make calculations, such as year-over-year
comparisons, difficult to compute. When there are no gaps, you can compare data
by referring from one row to another row a fixed distance away using the analytic
functions LEAD() and LAG().
For example, if you retrieve month-level data and want to refer to data from
12 months ago, it is convenient to access data 12 rows before the current value.
You cannot reliably use the LEAD()
and LAG() functions when the
number of rows per period (or whatever other dimension is used as the divider)
is inconsistent.
How can you fill in the gaps in the preceding example with
a partitioned outer join?
You can take the sparse data of our query above and do a partitioned
outer join with a dense set of time data. In the query shown below, the original
query is aliased as v and the data retrieved from
the times table is aliased as t. Here you see 22
rows because there are no gaps in the series. The four added rows each have
0 as their sales value set to 0 by using the NVL()
function..
SELECT Product_Name, t.Year, t.Week, NVL(Sales,0) dense_sales FROM (SELECT SUBSTR(p.Prod_Name,1,15) Product_Name, t.Calendar_Year Year, t.Calendar_Week_Number Week, SUM(Amount_Sold) Sales FROM Sales s, Times t, Products p WHERE s.Time_id = t.Time_id AND s.Prod_id = p.Prod_id AND p.Prod_name IN ('Bounce') AND t.Calendar_Year IN (2000,2001) AND t.Calendar_Week_Number BETWEEN 20 AND 30 GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number ) v PARTITION BY (v.Product_Name) RIGHT OUTER JOIN (SELECT DISTINCT Calendar_Week_Number Week, Calendar_Year Year FROM Times WHERE Calendar_Year in (2000, 2001) AND Calendar_Week_Number BETWEEN 20 AND 30 ) t ON (v.week = t.week AND v.Year = t.Year) ORDER BY t.year, t.week;
|
PRODUCT_NAME YEAR WEEK DENSE_SALES --------------- ---------- ---------- ----------- Bounce 2000 20 801 Bounce 2000 21 4062.24 Bounce 2000 22 2043.16 Bounce 2000 23 2731.14 Bounce 2000 24 4419.36 Bounce 2000 25 0 Bounce 2000 26 0 Bounce 2000 27 2297.29 Bounce 2000 28 1443.13 Bounce 2000 29 1927.38 Bounce 2000 30 1927.38 Bounce 2001 20 1483.3 Bounce 2001 21 4184.49 Bounce 2001 22 2609.19 Bounce 2001 23 1416.95 Bounce 2001 24 3149.62 Bounce 2001 25 2645.98 Bounce 2001 26 0 Bounce 2001 27 2125.12 Bounce 2001 28 0 Bounce 2001 29 2467.92 Bounce 2001 30 2620.17
22 rows selected.
|
Note that in the query above a WHERE
condition for weeks between 20 and 30 is placed in the inline view for the time
dimension. This step reduces the number of rows handled by the outer join, which
saves processing time.
Back to Topic List
How do you combine this technique with analytic SQL functions
to obtain cumulative sales for the desired weeks?
| 1. |
Start a SQL *Plus session. Select Start >
Programs > Oracle-OraDB10g_home > Application Development
> SQL Plus.
(Note: This tutorial assumes you have an c:\wkdir
folder. If you do not, you will need to create one and unzip the contents
of outer_j.zip into this folder. While
executing the scripts, paths are specified)

|
| 2. |
Log in as the SH user. Enter SH as the User Name and SH
as the Password. Then click OK.

|
| 3. |
From your SQL * Plus session, execute the following
script.
@c:\wkdir\fg.sql
The fg.sql
script contains the following:
SELECT Product_Name, t.Year, t.Week, Sales, Weekly_ytd_sales
FROM
(SELECT
SUBSTR(p.Prod_Name,1,15) Product_Name,
t.Calendar_Year Year,
t.Calendar_Week_Number Week,
NVL(SUM(Amount_Sold),0) Sales,
SUM(SUM(Amount_Sold)) OVER
(PARTITION BY p.Prod_Name, t.Calendar_Year
ORDER BY t.Calendar_Week_Number) Weekly_ytd_sales
FROM Sales s, Times t, Products p
WHERE s.Time_id = t.Time_id AND
s.Prod_id = p.Prod_id AND
p.Prod_name IN ('Bounce') AND
t.Calendar_Year IN (2000,2001) AND
t.Calendar_Week_Number BETWEEN 20 AND 30
GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number
) v
PARTITION BY (v.Product_Name)
RIGHT OUTER JOIN
(SELECT DISTINCT
Calendar_Week_Number Week,
Calendar_Year Year
FROM Times
WHERE Calendar_Year in (2000, 2001)
) t
ON (v.week = t.week AND v.Year = t.Year)
WHERE t.Week BETWEEN 20 AND 30
ORDER BY 1, 2, 3;

In this query, the weekly
year-to-date sales are calculated alongside the weekly sales. The NULL
values that the partitioned outer join inserts in making the time series
dense are handled in the usual way: the SUM
function treats them as 0s.
|
Back to Topic List
There are queries in which a partitioned outer join will return
rows with NULL values, but you may want those rows to hold the most recent non-NULL
value in the series. That is, if you want to have NULLs replaced with the first
non-NULL value you see as you scan upward in a column.
Inventory tables, which track quantity of units available
for various products, are a common case that needs such output. Inventory tables
are sparse: like sales tables, they need to only store a row for a product when
there is an event. For a sales table the event is a sale, and for the inventory
table, the event is a change in quantity available for a product. If you make
the inventory's time dimension dense, you want to see a quantity value for each
day. The value to output is the most recent non-NULL value. Note that this differs
from the prior example with cumulative sales. In that query, the cumulative
sum calculation treats NULLs as 0s, so it presents correct values. That approach
cannot work with inventory and similar tables because the value to place in
rows with NULLs is not a sum.
Here an example is presented of partitioned outer join with
an inventory table. It replaces NULLs with the nearest non-NULL value.
| 1. |
First, create a small inventory table with two products,
each product having entries for two days. The "bottle" product
has 10 units in stock on April 1 and 8 units on April 6. The "can" product has
15 units in stock on April 1 and 11 units on April 4. Execute the following SQL*Plus script:
@c:\wkdir\ci.sql
The ci.sql
script contains the following:
CREATE TABLE inventory (
time_id DATE,
product VARCHAR2(10),
quant NUMBER);
INSERT INTO inventory VALUES
(TO_DATE('01/04/01', 'DD/MM/YY'), 'bottle', 10);
INSERT INTO inventory VALUES
(TO_DATE('06/04/01', 'DD/MM/YY'), 'bottle', 8);
INSERT INTO inventory VALUES
(TO_DATE('01/04/01', 'DD/MM/YY'), 'can', 15);
INSERT INTO inventory VALUES
(TO_DATE('04/04/01', 'DD/MM/YY'), 'can', 11);

|
| 2. |
Now you will use a partitioned outer join to see the
quantity available for each product on each day of the range April 1 through
April 7. If you use a partitioned outer join to query this table without
considering the rows with NULL
values, the results are misleading. Execute the following SQL*Plus script:
@c:\wkdir\nn.sql
The nn.sql
script contains the following:
SELECT times.time_id, product, quant
FROM inventory
PARTITION BY (product)
RIGHT OUTER JOIN times
ON (times.time_id = inventory.time_id)
WHERE times.time_id BETWEEN TO_DATE('01/04/01', 'DD/MM/YY')
AND TO_DATE('07/04/01', 'DD/MM/YY')
ORDER BY 2,1;

The results above are not what you wantedyou know
that the quantities available for bottle and can in the NULL-value
rows were simply the most recent non-NULL value. For example, on April
2 through 5 for bottle, you want to see the quantity 10.
|
| 3. |
To show the desired results, you want to take advantage
of a new keyword added to the FIRST_VALUE
and LAST_VALUE functions
in Oracle Database 10g. You can specify
IGNORE NULLS in the argument list of either of these functions
and they will return the closest non-NULL value. Execute the following
SQL*Plus script:
@c:\wkdir\nn2.sql
The nn2.sql
script contains the following:
WITH v1 AS
(SELECT time_id
FROM times
WHERE times.time_id BETWEEN
TO_DATE('01/04/01', 'DD/MM/YY')
AND TO_DATE('07/04/01', 'DD/MM/YY'))
SELECT product, time_id, quant quantity,
LAST_VALUE(quant IGNORE NULLS)
OVER (PARTITION BY product ORDER BY time_id)
repeated_quantity
FROM
(SELECT product, v1.time_id, quant
FROM inventory PARTITION BY (product)
RIGHT OUTER JOIN v1
ON (v1.time_id = inventory.time_id))
ORDER BY 1, 2;

|
Back to Topic List
In the next task, you use the outer join feature to compare
values across time periods. Specifically, you will calculate a year-over-year
sales comparison at the week level. The query will return on the same row, for
each product, the year-to-date sales for each week of 2001 with that of 2000.
| 1. |
To improve readability of the query and focus on the
partitioned outer join, use a
WITH clause to start the query. Execute the following SQL*Plus
script:
@c:\wkdir\pp.sql
The pp.sql
script contains the following:
WITH v AS
(SELECT
p.Prod_Name Product_Name,
t.Calendar_Year Year,
t.Calendar_Week_Number Week,
SUM(Amount_Sold) Sales
FROM Sales s, Times t, Products p
WHERE s.Time_id = t.Time_id AND
s.Prod_id = p.Prod_id AND
p.Prod_name in ('Y Box') AND
t.Calendar_Year in (2000,2001) AND
t.Calendar_Week_Number BETWEEN 30 AND 40
GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number
)
SELECT substr(Product_Name,1,12) Prod,
Year,
Week,
Sales,
Weekly_ytd_sales,
Weekly_ytd_sales_prior_year
FROM
(SELECT --Start of year_over_year sales
Product_Name, Year, Week, Sales, Weekly_ytd_sales,
LAG(Weekly_ytd_sales, 1) OVER
(PARTITION BY Product_Name, Week ORDER BY Year)
Weekly_ytd_sales_prior_year
FROM
(SELECT --Start of dense_sales
v.Product_Name Product_Name,
t.Year Year,
t.Week Week,
NVL(v.Sales,0) Sales,
SUM(NVL(v.Sales,0)) OVER
(PARTITION BY v.Product_Name, t.Year
ORDER BY t.week) weekly_ytd_sales
FROM v
PARTITION BY (v.Product_Name)
RIGHT OUTER JOIN
(SELECT DISTINCT
Calendar_Week_Number Week,
Calendar_Year Year
FROM Times
WHERE Calendar_Year IN (2000, 2001)
) t
ON (v.week = t.week AND v.Year = t.Year)
) dense_sales
) year_over_year_sales
WHERE Year = 2001 AND
Week BETWEEN 30 AND 40
ORDER BY 1, 2, 3;

In the
FROM clause of the in-line view DENSE_SALES,
a partitioned outer join of aggregate view v
and time view t
is used to fill gaps in the sales data along the time dimension. The output
of the partitioned outer join is then processed by the analytic function
SUM ... OVER to compute
the weekly year-to-date sales (the "weekly_ytd_sales"
column). Thus, the view DENSE_SALES
computes the year-to-date sales data for each week, including those
missing in the aggregate view s.
The in-line view YEAR_OVER_YEAR_SALES
then computes the year ago weekly year-to-date sales using the LAG
function. The LAG function
labeled "weekly_ytd_sales_prior_year"
specifies a PARTITION BY
clause that pairs rows for the same week of years 2000 and 2001 into a
single partition. An offset of 1 is passed to the LAG
function to get the weekly year-to-date sales for the prior year.
The outermost query block
selects data from YEAR_OVER_YEAR_SALES
with the condition yr = 2001, and thus the query returns, for each product,
its weekly year-to-date sales in the specified weeks of years 2001 and
2000.
|
Back to Topic List
Whereas the previous example
showed you a way to create comparisons for a single time level, it is even more
useful to handle multiple time levels in a single query. For example, you can
compare sales versus the prior period at the year, quarter, month, and day levels.
For the next task, you create
a query that performs a year-over-year comparison of year-to-date sales for
all levels of our time hierarchy.
Several steps are needed to
perform this task. The goal is a single query with comparisons at the day, week,
month, quarter, and year levels. You will use a materialized view MV_PROD_TIME
that holds a hierarchical cube of sales aggregated across TIMES
and PRODUCTS. Along with the
materialized view, you will create a view on top of it. Also, you create a view
of the time dimension to use as an edge of the cube. The time edge is a partition
outer joined to the sparse data in the materialized view.
For more information regarding
hierarchical cubes, see the chapter titled "SQL for Aggregation in Data
Warehouses" in the Data Warehousing Reference Guide.
| 1. |
Create the materialized view. Note that the query is
limited to only two products to keep processing time short. Execute the
following SQL*Plus script:
@c:\wkdir\cm1.sql
The cm1.sql
script contains the following:
CREATE MATERIALIZED VIEW mv_prod_time
REFRESH COMPLETE ON DEMAND
AS
SELECT
(CASE
WHEN ((GROUPING(calendar_year)=0 )
AND (GROUPING(calendar_quarter_desc)=1 ))
THEN (TO_CHAR(calendar_year) || '_0')
WHEN ((GROUPING(calendar_quarter_desc)=0 )
AND (GROUPING(calendar_month_desc)=1 ))
THEN (TO_CHAR(calendar_quarter_desc) || '_1')
WHEN ((GROUPING(calendar_month_desc)=0 )
AND (GROUPING(t.time_id)=1 ))
THEN (TO_CHAR(calendar_month_desc) || '_2')
ELSE (TO_CHAR(t.time_id) || '_3') END) Hierarchical_Time,
calendar_year year,
calendar_quarter_desc quarter,
calendar_month_desc month,
t.time_id day,
prod_category cat,
prod_subcategory subcat,
p.prod_id prod,
GROUPING_ID(prod_category, prod_subcategory, p.prod_id,
calendar_year, calendar_quarter_desc,
calendar_month_desc,t.time_id) gid,
GROUPING_ID(prod_category, prod_subcategory, p.prod_id) gid_p,
GROUPING_ID(calendar_year, calendar_quarter_desc,
calendar_month_desc, t.time_id) gid_t,
SUM(amount_sold) s_sold,
COUNT(amount_sold) c_sold,
COUNT(*) cnt
FROM SALES s, TIMES t, PRODUCTS p
WHERE s.time_id = t.time_id AND
p.prod_name in ('Bounce', 'Y Box') AND
s.prod_id = p.prod_id
GROUP BY
ROLLUP(calendar_year, calendar_quarter_desc,
calendar_month_desc, t.time_id),
ROLLUP(prod_category, prod_subcategory, p.prod_id);

Because the materialized view is limited to two products,
it has just over 2200 rows. Note that the Hierarchical_Time
column contains string representations of time from all levels of the
time hierarchy. The CASE
expression used for the Hierarchical_Time
column appends a marker (_0, _1, ...) to each date string to denote the
time level of the value. _0 represents the year level, _1 is quarters,
_2 is months, and _3 is day. Note that the GROUP
BY clause is a concatenated ROLLUP
that specifies the roll-up hierarchy for the time and product dimensions.
The GROUP BY clause determines
the hierarchical cube contents.
|
| 2. |
Create a view CUBE_PROD_TIME
with the same definition as the materialized view MV_PROD_TIME.
Execute the following SQL*Plus script:
@c:\wkdir\cv1.sql
The cv1.sql
script contains the following:
CREATE OR REPLACE VIEW cube_prod_time
AS
SELECT
(CASE
WHEN ((GROUPING(calendar_year)=0 )
AND (GROUPING(calendar_quarter_desc)=1 ))
THEN (TO_CHAR(calendar_year) || '_0')
WHEN ((GROUPING(calendar_quarter_desc)=0 )
AND (GROUPING(calendar_month_desc)=1 ))
THEN (TO_CHAR(calendar_quarter_desc) || '_1')
WHEN ((GROUPING(calendar_month_desc)=0 )
AND (GROUPING(t.time_id)=1 ))
THEN (TO_CHAR(calendar_month_desc) || '_2')
ELSE (TO_CHAR(t.time_id) || '_3') END) Hierarchical_Time,
calendar_year year,
calendar_quarter_desc quarter,
calendar_month_desc month,
t.time_id day,
prod_category cat,
prod_subcategory subcat,
p.prod_id prod,
GROUPING_ID(prod_category, prod_subcategory, p.prod_id,
calendar_year, calendar_quarter_desc, calendar_month_desc,
t.time_id) gid,
GROUPING_ID(prod_category, prod_subcategory, p.prod_id) gid_p,
GROUPING_ID(calendar_year, calendar_quarter_desc,
calendar_month_desc, t.time_id) gid_t,
SUM(amount_sold) s_sold,
COUNT(amount_sold) c_sold,
COUNT(*) cnt
FROM SALES s, TIMES t, PRODUCTS p
WHERE s.time_id = t.time_id AND
p.prod_name IN ('Bounce', 'Y Box') AND
s.prod_id = p.prod_id
GROUP BY
ROLLUP(calendar_year, calendar_quarter_desc,
calendar_month_desc, t.time_id),
ROLLUP(prod_category, prod_subcategory, p.prod_id);

|
| 3. |
You create a view EDGE_TIME
which is a complete set of date values. EDGE_TIME
is the source for filling time gaps with a partitioned outer join.
The HIERARCHICAL_TIME column
in EDGE_TIME will be used
in a partitioned join with the HIERARCHICAL_TIME
column in the CUBE_PROD_TIME
view. Execute the following SQL*Plus script:
@c:\wkdir\cv2.sql
The cv2.sql
script contains the following:
CREATE OR REPLACE VIEW edge_time
AS
SELECT
(CASE
WHEN ((GROUPING(calendar_year)=0 )
AND (GROUPING(calendar_quarter_desc)=1 ))
THEN (TO_CHAR(calendar_year) || '_0')
WHEN ((GROUPING(calendar_quarter_desc)=0 )
AND (GROUPING(calendar_month_desc)=1 ))
THEN (TO_CHAR(calendar_quarter_desc) || '_1')
WHEN ((GROUPING(calendar_month_desc)=0 )
AND (GROUPING(time_id)=1 ))
THEN (TO_CHAR(calendar_month_desc) || '_2')
ELSE (TO_CHAR(time_id) || '_3') END) Hierarchical_Time,
calendar_year yr,
calendar_quarter_number qtr_num,
calendar_quarter_desc qtr,
calendar_month_number mon_num,
calendar_month_desc mon,
time_id - TRUNC(time_id, 'YEAR') + 1 day_num,
time_id day,
GROUPING_ID(calendar_year, calendar_quarter_desc,
calendar_month_desc, time_id) gid_t
FROM TIMES
GROUP BY ROLLUP
(calendar_year,
(calendar_quarter_desc, calendar_quarter_number),
(calendar_month_desc, calendar_month_number), time_id);

|
| 4. |
You now have the required
elements for the comparison query. You can obtain period-to-period comparison
calculations at all time levels. It requires applying analytic functions
to a hierarchical cube with dense data along the time dimension. Some
of the calculations you can achieve for each time level are:
- Sum of sales for prior period
at all levels of time
- Variance in sales over prior
period
- Sum of sales in the same period
a year ago at all levels of time
- Variance in sales over the same
period last year
The following example
performs all four of these calculations. It uses a partitioned outer join
of the CUBE_PROD_TIME and
EDGE_TIME views to create
an in-line view of dense data called DENSE_CUBE_PROD_TIME.
The query then uses the LAG
function in the same way as the prior single-level example. The outer
WHERE clause specifies
time at three levels: the days of August 2001, the entire month, and the
entire third quarter of 2001. Note that the last two rows of the results
contain the month-level and quarter-level aggregations.
Execute the following SQL*Plus script:
@c:\wkdir\mt.sql
The mt.sql
script contains the following:
SELECT
substr(prod,1,4) prod, substr(Hierarchical_Time,1,12) ht,
sales,
sales_prior_period,
sales - sales_prior_period variance_prior_period,
sales_same_period_prior_year,
sales - sales_same_period_prior_year variance_same_period_p_year
FROM
(SELECT cat, subcat, prod, gid_p, gid_t, Hierarchical_Time,
yr, qtr, mon, day, sales,
LAG(sales, 1) OVER (PARTITION BY gid_p, cat, subcat, prod,
gid_t ORDER BY yr, qtr, mon, day)
sales_prior_period,
LAG(sales, 1) OVER (PARTITION BY gid_p, cat, subcat, prod,
gid_t, qtr_num, mon_num, day_num ORDER BY yr)
sales_same_period_prior_year
FROM
(SELECT c.gid, c.cat, c.subcat, c.prod, c.gid_p,
t.gid_t, t.yr, t.qtr, t.qtr_num, t.mon, t.mon_num,
t.day, t.day_num, t.Hierarchical_Time, NVL(s_sold,0) sales
FROM cube_prod_time c
PARTITION BY (gid_p, cat, subcat, prod)
RIGHT OUTER JOIN edge_time t
ON ( c.gid_t = t.gid_t AND c.Hierarchical_Time = t.Hierarchical_Time)
) dense_cube_prod_time
) -- side by side current,prior and prior year sales
WHERE prod IN (139) AND gid_p=0 AND -- 1 product and product level data
( (mon IN ('2001-08' ) AND gid_t IN (0, 1) ) OR -- day and month data
( qtr IN ('2001-03' ) AND gid_t IN (3) ) ) -- quarter level data
ORDER BY day;


|
Back to Topic List
In many OLAP tasks, it is helpful to define custom members
in a dimension. For instance, you might define a specialized time period for
analyses. You can use a partitioned outer join to temporarily add a member to
a dimension. Note that the new SQL MODEL
clause introduced in Oracle Database 10g is suitable for creating more
complex scenarios involving new members in dimensions. See the tutorial titled
"SQL for Modeling" in the Data Warehousing Reference Guide
for more information on this topic.
In this exercise, you define a new member for the TIME
dimension. You create a 13th member of the month level in the TIME
dimension. This 13th month is defined as the summation of the sales for each
product in the first month of each quarter of year 2001. You build this solution
using the views and tables created in the prior example.
| 1. |
Create a view with the new member added to the appropriate
dimension. The view uses a UNION
ALL operation to add the new member. To query using the custom
member, use a CASE expression
and a partitioned outer join. Execute the following SQL*Plus script:
@c:\wkdir\cv3.sql
The cv3.sql
script contains the following:
CREATE OR REPLACE VIEW time_c AS
(SELECT *
FROM edge_time
UNION ALL
SELECT '2001-13_2', 2001, 5, '2001-05', 13, '2001-13', null, null,
8 -- <gid_of_mon>
FROM DUAL);
In the statement shown, the TIME_C
view is defined by performing a UNION
ALL of the EDGE_TIME
view (defined in the prior example) and the user-defined 13th month. UNION
ALL specifies the attributes for a 13th month member by doing a
SELECT from the DUAL
table. Note that the grouping id, column gid_t,
is set to 8, and the quarter number is set to 5.

|
| 2. |
The in-line view of the query shown below performs a
partitioned outer join of CUBE_PROD_TIME
with TIME_C. This step
creates sales data for the 13th month at each level of product aggregation.
In the main query, the analytic function SUM
is used with a CASE expression
to compute the 13th month, which is defined as the summation of the first
month's sales of each quarter. Execute the following SQL*Plus script:
@c:\wkdir\cv4.sql
The cv4.sql
script contains the following:
SELECT * from
(
SELECT substr(cat,1,12) cat, substr(subcat,1,12) subcat,
substr(prod,1,9) prod, mon, mon_num,
SUM(CASE WHEN mon_num IN (1, 4, 7, 10)
THEN s_sold
ELSE NULL
END)
OVER (PARTITION BY gid_p, prod, subcat, cat, yr) sales_month_13
FROM
(SELECT c.gid, c.prod, c.subcat, c.cat, gid_p,
t.gid_t , t.day, t.mon, t.mon_num,
t.qtr, t.yr, NVL(s_sold,0) s_sold
FROM cube_prod_time c
PARTITION BY (gid_p, prod, subcat, cat)
RIGHT OUTER JOIN time_c t ON
(c.gid_t = t.gid_t AND c.Hierarchical_Time = t.Hierarchical_Time)
)
)
WHERE mon_num=13;

The SUM function
used in generating these results had a CASE
statement to limit the data to months 1, 4, 7, and 10 within each year.
Because of the tiny data set, with just two products, the roll-up values
of the results are necessarily repetitions of lower-level aggregations.
For a more realistic set of roll-up values, you can include more products
from the "Game Console" and "Y Box Games" subcategories
in the underlying materialized view.
|
Back to Topic List
In this tutorial, you learned how to:
 |
Fill gaps in data using analytic SQL functions |
 |
Replace NULLS with the nearest non-NULL value |
 |
Perform a period-to-period comparison |
 |
Create a custom member in a dimension |
Back to Topic List
|