TECHNOLOGY: Business Intelligence
Lighten the LoadBy Mark Rittman
Oracle Data Integrator 11g makes data loading flexible, restartable, and routine.
Developers working on data integration projects are often required to load numerous database tables in a particular sequence, with parts of the load process carried out in parallel to reduce load times. Ideally such load routines should be configurable—so that, for example, a data warehouse can be reloaded or refreshed with new data—and it should be possible to restart a failed load routine once the reason for the failure has been addressed.
To handle these requirements, the 126.96.36.199 release of Oracle Data Integrator 11g introduces load plans. Load plans—building on the interfaces, packages, procedures, and scenarios already present in Oracle Data Integrator projects—provide the ability to create hierarchical data integration processes that enable conditional execution, parallel execution of integration tasks, and plan restartability after a failure.
Creating Your First Load Plan
So how do load plans work, and how do they differ from packages, the traditional way to sequence integration steps in Oracle Data Integrator 11g? To find out, let’s work through a scenario in which data is sourced from the OE (Order Entry) sample schema that comes with most Oracle Database releases and is loaded into product and customer dimension tables as well as an ORDERS fact table in another schema. If you want to try this new feature yourself, download and install Oracle Data Integrator 188.8.131.52, access a database with the OE sample schema installed, and download and install the load plan project files. Follow the instructions in the zip file for installing the load plan project files.
In the initial version of this article’s Oracle Data Integrator project, a package loads each table in turn via a set of interfaces. Now let’s enhance this load routine, so that (1) the two dimension tables are loaded in parallel before the fact table and (2) the user has the option to load just the fact table, skipping the dimension table load.
To do this, follow these steps:
Now that you’ve created the basic load plan, let’s test it out. Click Save to save your load plan details, and ensure that you have a standalone agent running (because you cannot use the built-in agent that comes with Oracle Data Integrator’s Studio to run load plans). Click the Execute button at the top of the load plan editor, enter 0 as the startup value of the LoadOrdersOnly variable to trigger a full load, and then switch to the Load Plan Executions pane within the Operator navigator to see the outcome of the load plan run.
Double-click the load plan run under the Agent folder on the Load Plan Executions pane. A window opens, showing the actual steps that were executed by this load plan run. In this case, because you passed 0 as the variable value when executing the load plan, the Else part of the plan executed and performed a full load. If you executed the load plan again but this time passed 1 as the LoadOrdersOnly variable value, you would see the When part executed instead.
Exceptions and Plan Restartability
So far you’ve seen the conditional execution part of load plans in action, but what about exceptions and restartability?
Let’s continue this scenario by considering how you might handle a situation in which the load plan tries to process rows for the ORDERS fact table but those orders reference product dimension IDs that don’t exist, a common scenario for data warehouse developers.
To simulate this situation, let’s first disable the constraint on the OE.ORDER_ITEMS table that stops you from entering invalid product ID values into the PROD_ID column. (You might want to back up your OE schema before doing this, so that you can restore it to its original values afterward.)
ALTER TABLE order_items DISABLE CONSTRAINT order_items_ product_id_fk;
INSERT INTO orders VALUES (3000,TO_DATE('31-MAR-2012', 'DD-MON-YYYY'),'direct',118,5,4000, 163,null); INSERT INTO order_items VALUES (3000,1,9999,1,100); INSERT INTO order_items VALUES (3000,1,3134,2,50); COMMIT;
Now execute the load plan again, passing 1 as the LoadOrdersOnly value to trigger a full load. This time the load plan fails at the step where it tries to load the fact table, because the product key lookup fails and Oracle Database raises an error when the load plan subsequently tries to insert a NULL value into the OE_TARGET .FACT_ORDERS.PROD_ID column, which has a NOT NULL constraint on it, as shown in Figure 3.
Mark Rittman is an Oracle ACE Director and cofounder of Rittman Mead, an Oracle Gold Partner based in the U.K., with offices in the U.S., India, and Australia. Rittman has worked with Oracle’s BI, data integration, and data warehousing products for more than 15 years and writes for the Rittman Mead blog at rittmanmead.com/blog.
Send us your comments