| Oracle9i
introduces a new set of server functionality especially beneficial
for the ETL (Extraction, Transformation, and Loading) part of
any Business Intelligence process flow, addressing all the needs
of highly scalable data transformation inside the database.
One of the most exciting new features addressing
the needs for ETL is the SQL statement MERGE. The new SQL
combines the sequence of conditional INSERT and UPDATE commands
in a single atomic statement, depending on the existence of
a record. This operation is commonly known as Upsert functionality.
In a data warehouse environment, tables (typically fact tables)
need to be refreshed periodically with new data arriving from
on-line systems. This new data may contain changes to existing
rows in tables of the warehouse and/or new rows that need
to be inserted. If a row in the new data corresponds to an
item that already exists in the table, an UPDATE is performed;
if the row's primary key does not exist in the table, an INSERT
is performed. Many times, the source system cannot distinguish
between newly inserted or changed information during the extraction;
for very complex transformations, it's sometimes nearly impossible
to know the effect of changed source data. These scenarios
require the determination of insert versus update to be done
during data loading.
Prior to Oracle9i,
these operations were expressed either as a sequence of DMLs
(INSERT/UPDATE) or as PL/SQL loops deciding, for each row,
whether to insert or update the data. Both these techniques
face performance handicaps: the first requires multiple data
scans, and the second operates on a per-record basis. By extending
SQL with a new syntax - the MERGE statement - Oracle9i
overcomes the deficiencies of the old approaches and makes
the implementation of warehousing applications more simple
and intuitive.
The following is an example of the MERGE statement.
The fact table SALES_FACT in a data warehouse for a retailer
needs to be periodically updated with sales data coming from
the on-line systems. If the retailer opens a new store, then
the data for the store needs to be inserted into the SALES_FACT
table.
Oracle9i
Implementation:
In Oracle9i,
the MERGE statement INSERTS and UPDATES the data with a single
SQL statement.
MERGE INTO SALES_FACT D
USING SALES_JUL01 S
ON (D.TIME_ID = S.TIME_ID
AND D.STORE_ID = S.STORE_ID
AND D.REGION_ID = S.REGION_ID)
WHEN MATCHED THEN
UPDATE
SET d_parts = d_parts + s_parts,
d_sales_amt = d_sales_amt + s_sales_amt,
d_tax_amt = d_tax_amt + s_tax_amt,
d_discount = d_discount + s_discount
WHEN NOT MATCHED THEN
INSERT (D.TIME_ID ,D.STORE_ID ,D.REGION_ID,
D.PARTS ,D.SALES_AMT ,D.TAX_AMT ,D.DISCOUNT)
VALUES (
S.TIME_ID ,S.STORE_ID ,S.REGION_ID,
S.PARTS ,S.SALES_AMT ,S.TAX_AMT ,S.DISCOUNT);
Oracle8i
Implementation:
In Oracle8i,
you could choose to implement it as a sequence of DML statements.
UPDATE
(SELECT
S.TIME_ID
,S.STORE_ID ,S.REGION_ID,
S.PARTS s_parts
,S.SALES_AMT s_sales_amt ,S.TAX_AMT s_tax_amt ,S.DISCOUNT
s_discount,
D.PARTS d_parts
,D.SALES_AMT d_sales_amt ,D.TAX_AMT d_tax_amt ,D.DISCOUNT
d_discount
FROM SALES_JUL01 S, SALES_FACT
D
WHERE D.TIME_ID = S.TIME_ID
AND D.STORE_ID = S.STORE_ID
AND D.REGION_ID = S.REGION_ID)
JV
SET d_parts = d_parts
+ s_parts,
d_sales_amt
= d_sales_amt + s_sales_amt,
d_tax_amt
= d_tax_amt + s_tax_amt,
d_discount
= d_discount + s_discount
;
INSERT INTO SALES_FACT (
TIME_ID,STORE_ID ,REGION_ID,
PARTS ,SALES_AMT ,TAX_AMT ,DISCOUNT)
SELECT
S.TIME_ID ,S.STORE_ID
,S.REGION_ID,
S.PARTS ,S.SALES_AMT
,S.TAX_AMT ,S.DISCOUNT
FROM SALES_JUL01 S
WHERE (S.TIME_ID, S.STORE_ID,
S.REGION_ID) NOT IN (
SELECT D.TIME_ID,
D.STORE_ID, D.REGION_ID
FROM SALES_FACT
D
)
;
Alternatively, you could create a procedural
implementation. It would need to check every load record against
the target to find if the record already exists; only then
could it decide whether to insert or update the data.
Both of these approaches suffer from deficiencies
in performance and usability. The new MERGE command overcomes
these deficiencies, processing the conditional INSERT-or-UPDATE
within a single statement. The data is scanned only once,
and the appropriate DML command is issued, either serially
or in parallel.
The new MERGE command brings major performance
benefits by providing an optimized internal feature for the
common Upsert task within ETL processing. Furthermore, it
simplifies the development of transformation processing inside
the database - using Oracle9i
as the transformation engine.
More
Info
Oracle9i
Database Daily Features
|
 |