By Yuli Vasiliev
Learn how to use the JDBC connectivity layer in Oracle Warehouse Builder 11g Release 2 to manipulate data in JDBC-accessible sources.
Published October 2010
Business Intelligence (BI) can be thought of as the process of extracting business information from the data available at your disposal, regardless of the data formats and ways in which that data can be accessed. So the ability to natively connect to heterogeneous data sources is an important feature for any BI system.
Being a full-featured tool for deriving, generating, and sharing BI metadata and data, Oracle Warehouse Builder can work with a wide range of sources, including heterogeneous database systems, ODBC sources such as MS Excel and MS Access, and even flat files. In practice, though, you often have to deal with several different database systems.
In Oracle Warehouse Builder, you can access such database systems through either Oracle Heterogeneous Services (Gateways) or Code Templates (CTs) that use JDBC as the connection layer. While the former option has been around for a long time already, the latter is new to Oracle Warehouse Builder 11g Release 2 and is more flexible and efficient as it enables you to natively connect to virtually any JDBC-accessible data system, without doing a lot of configuration work.
Along with the code templates, the Oracle Warehouse Builder’s JDBC connectivity layer comprises an open connector, which allows for connecting to heterogeneous data sources, as well as for extracting, transforming, and integrating data from those sources. Code templates are mostly generic since they sit on the JDBC connectivity layer, which takes care of the data source specifics. Besides, Oracle Warehouse Builder uses platform definitions to describe how the native types of a non-Oracle database source map to the Oracle Warehouse Builder's core types.
As you might guess, this architecture simplifies the task of adding new platforms to the list of supported ones, as well as the task of adding a new code template in case you want to add some custom functionality to the current code template library.
For example, suppose you want to add support for MySQL. (As of OWB 11g R2, MySQL is not on the list of supported by default platforms.) All you need to do, though, is download the MySQL JDBC driver to put it into the OWB_HOME/owb/lib/ext directory, and add the platform definition for MySQL via a Tcl script that you can run from the OMB Plus console. The contents of such a script is beyond the scope of this article. However, if you want to look at one, check out this post by David Allan, where you’ll find a detailed example of how you can add support for MySQL to Oracle Warehouse Builder 11g Release 2. Also, there is a whitepaper on OTN called the "OWB Platform and Application Adapter Extensibility Cookbook", which goes into more depth than David’s post.
Thankfully, you won’t need to add the platform definitions for most popular database platforms, since Oracle Warehouse Builder 11g Release 2 comes with them already defined. Thus, the list of platforms supported by default include: DB2, SQL Server, Sybase, and Teradata. However, before you can access any of those databases through the code templates using JDBC, you still have to obtain the JDBC driver from the corresponding vendor and put it into the OWB_HOME/owb/lib/ext directory.
Figure 1 gives a simplified view of the architecture discussed here:
Figure 1: A high-level view at the open connector components used in the OWB 11g R2 environment.
As you can see, the code template library composed of a collection of code templates is a key component of the above architecture. Each code template in turn is comprised of a set of predefined tasks, most of which are normally JDBC tasks such as create work table and load data. Aside from JDBC tasks, though, a code template may also include Jython, Runtime API, OS, and Jacl tasks to fulfill its functionality.
Diagrammatically, this might look like Figure 2:
Figure 2: A code template is comprised of a set of predefined tasks, most of which are normally JDBC tasks.
In most cases, though, you don’t have to go into task-by-task detail of a code template to get a solution for a certain ETL task. Oracle Warehouse Builder 11g Release 2 comes with a set of predefined code templates allowing you to perform different ETL tasks. The predefined code templates are implemented with the code to be run within a J2EE runtime environment. For that purpose, there is the Control Center Agent (CCA) that runs on the OC4J server. To perform the ETL task to which a code template is dedicated, you first must create a CT mapping associated with this code template. Next, you deploy that CT mapping to CCA, which must be already launched, and then execute that mapping to perform the defined task.
The following diagram provides a conceptual depiction of the above execution model:
Figure 3: A simplified view at the execution model for the Code Template technology.
In actual fact, though, things are a bit more complex than the above can tell you. As mentioned, to derive benefit from a code template, you first have to create a CT mapping associated with that code template. Based upon the contents of the code template, Oracle Warehouse Builder generates the mapping code. In particular, it generates the Tcl/Java(Jacl) scripts required to perform the necessary ETL task (Oracle Warehouse Builder uses the Java implementation of Tcl called Jacl). Then, when you execute the mapping, CCA runs those Tcl/Java(Jacl) scripts containing embedded JDBC statements.
Now that you have a grasp of the ideas behind native heterogeneous connectivity used in Oracle Warehouse Builder 11g R2, let's work through a simple example to see things in action. The following example illustrates how you can extract data natively from two SQL Server tables and put it, after joining, into another SQL Server table, using code template mappings and JDBC connectivity. Then, it shows you how to look inside a code template and even customize it as necessary, adding a new JDBC task to it.
Before You Start
As far as Microsoft SQL Server is concerned, the first thing you’ll need to do is download the appropriate JDBC driver from the Microsoft Download Center. For example, if you’re using Microsoft SQL Server 2005, you’ll need the Microsoft SQL Server 2005 JDBC Driver 1.0, a type 4 JDBC driver. Then, you’ll need to extract the sqljdbc.jar file from the downloaded package and put it into the OWB_HOME/owb/lib/ext directory.
The next step is to launch a CCA instance. Although you don’t need the CCA instance at design time, you’ll need it when deploying and executing. If you recall from the discussion in the “A high-level view section” earlier in this article, CCA is the agent that executes code templates in the OC4J server. To recap, code templates are comprised of predefined tasks, most of which are normally JDBC tasks. To start the agent, you must execute the ccastart file available at OWB_HOME/owb/bin/unix or the ccastart.bat file at OWB_HOME/owb/bin/win32 in case you’re on Windows. When launching the agent for the first time, you’ll be prompted to enter a password for the oc4jadmin user. You’ll need that password when creating the template mapping module later in this article.
You also have to make sure that TCP/IP protocol is enabled on the SQL Server instance, which you’re going to connect to. For example, TCP/IP is disabled by default in SQL Server Express. So, you’ll need to enable it through the SQL Server Configuration Manager and then restart the SQL Server Express service. Also make sure any firewall configuration issues are resolved. For more information on this, you might check out "Configure a Windows Firewall for Database Engine Access" or "Configuring the Windows Firewall to Allow SQL Server Access".
Creating the Project Structure
Once you’ve completed the preliminary steps described in the preceding section, you can go ahead and create a JDBC connection to your SQL Server database and then import metadata from data objects of interest, under a project in Warehouse Builder. In particular, you’ll need to create two modules under the databases\SQL Server node, which will be a source module and a target module mapped to the source schema and target schema on the SQL Server database, respectively. Next, you’ll need to create a template mapping module under the Template Mappings node, and then create a CT mapping within that module, laying out and connecting the source and target objects on the mapping canvas.
To summarize, you’ll need to make sure you have the following objects on both the Warehouse Builder and SQL Server side:
On SQL Server:
On Warehouse Builder:
To save space, I won’t go into details about creating the above objects. Detailed instructions can be found in the documentation. Moreover, creating a similar project structure is discussed in detail in Mark Rittman's "Oracle Warehouse Builder 11g Release 2 and Heterogeneous Databases". Here I will focus on creating the CT mapping. To fulfill this task, you can follow the steps below:
Figure 4: Building the Logical View of the CT mapping.
In this example, you’ve used a single execution unit that executes on the SQL Server. Alternatively, you might manually create two execution units for the source and target respectively. Using multiple execution units is more common when moving data between databases over the network, or especially between different platforms. OWB generates code for each execution unit that ultimately executes on the source and the target databases.
If using multiple execution units, you would need to specify a code template for each unit, selecting a load code template for the execution unit containing the source tables and an integration code template for the execution unit containing the target table. In that case, Warehouse Builder would create a staging table in the SQL Server database, first loading the joined data into that staging table and then moving it to the target table.
Deployment and Execution
You’re ready now to deploy the CT mapping created as discussed in the preceding section. To do this, you can right-click the mapping node in Project Navigator, and then select Deploy … in the popup menu. The deployment process may take some time, ending up with no error messages if everything is OK.
Deployment of a CT mapping results in the generation of an EAR file deployed to the Warehouse Builder Control Center Agent. So, you can find those EAR files in the OWB_HOME/owb/jrt/applications directory.
Turning back to our example, the next step is execution of the CT mapping just deployed. To start execution, you can right-click the mapping node in Project Navigator, and then select Start … in the popup menu. Again, it should take some time to complete the tasks included in the code template associated with the CT mapping via the execution unit as discussed in the preceding section.
The screenshot in the figure below illustrates what you might see once the execution has been successfully completed:
Figure 5: The CT mapping execution completed.
If you now check out the target table in the SQL Server database, you should see it’s populated with the data combined from the source tables.
The above example is a good illustration of how easy it is to use code templates to handle standard data management tasks on non-Oracle data sources from within Warehouse Builder. All you had to do was associate the CT mapping with an appropriate built-in code template and then deploy and execute that mapping. Although the execution output might give you a clue about the tasks the code template was comprised of, you actually didn’t have to delve into the code template structure to get the job done.
In practice, though, you may face a situation in which you’ll need to adjust a built-in code template to solve the problem you have. Of course, it would be a bad idea to modify a built-in template directly. Instead, you could import it to your Warehouse Builder project to make a copy of it, and then adjust that copy as needed. Again, this is best understood by example.
Let’s turn back to our example and look at what can be done here if you, for example, need to do all inserts within a single transaction. First, though, let’s look at why you might need it.
Assuming the target table uses the primary keys taken from the source data, you’ll get a violation of primary key constraint error if you try to start a subsequent execution. This behavior is expected, of course. The problem is, though, the default transaction mode in SQL Server is autocommit. What this means in practice is that the change made by every DML operation is immediately committed. In our example, this may lead to a situation when the insertion operation is done only partly, till the first duplicated primary key happens – without the ability to rollback what has been already committed.
Of course, there may be an already existing integration code template, or will be in the next releases, that addresses this problem. However, the purpose here is to illustrate how you can adjust an existing code template to meet your current needs. So, the following steps describe how to import the ICT_SQL_TO_SQL_APPEND code template and then modify it so that it performs the inserts on SQL Server within a single transaction:
In Project Navigator, right-click the Code Templates node and then select New Code Template Folder to create a new CT folder.
Expand the newly created CT folder node and then right-click the Integration node.
In the popup menu, select Import->Code Template to launch the Import Code Template wizard.
On the Select Files screen of the wizard, click the Browse button and select the OWB_HOME/owb/misc/CodeTemplates folder. Then, move the KM_IKM_SQL_to_SQL_Append.xml file from the Directory Files box to the Files to Import box.
On the Name Code Templates screen of the wizard, provide the name for the code template. For example, it could be ICT_ SQL_TO_SQLSERVER_APPEND. (This name makes it clear that SQL Server is the target for this CT – just saying “SQL_TO_SQL” is how generic code templates are typically named, meaning to work on any SQL database.)
Complete the wizard. As a result, ICT_SQL_TO_SQLSERVER_APPEND should appear under the Code Templates/CODE_TEMPLATE_FOLDER/Integration node in Project Navigator.
Double-click the newly created ICT_ SQL_TO_SQLSERVER_APPEND node to open the template structure in the Code Template Editor.
In the Code Template Editor, you can pull apart the icons associated with the template tasks to understand the process flow behind the template. Or you can select the autolayout option.
In the Code Template Editor, delete the arrow connecting the 4_LOCK_JOURNALIZED_TABLE and 5_INSERT_NEW_ROWS tasks.
Drag and drop a new JDBC task from the Component Palette to the Code Template Editor, placing it between the 4_LOCK_JOURNALIZED_TABLE and 5_INSERT_NEW_ROWS tasks.
Connect the 4_LOCK_JOURNALIZED_TABLE task with the newly created JDBC_TASK. To do this, place the mouse cursor upon the 4_LOCK_JOURNALIZED_TABLE icon, press and hold the left mouse button, and then draw a line to the JDBC_TASK icon.
In the same way, connect JDBC_TASK with 5_INSERT_NEW_ROWS.
Select the JDBC_TASK icon and move on to the Task Editor located below the Code Template Editor.
In the Task Editor, click the Target tab and type in the following code in the editor below:
Figure 6: Adding a new JDBC task to a code template.
Select the File->Save All menu to save the changes made.
Now you can include the newly created code template in a CT mapping and then deploy and execute that mapping as discussed in the preceding section. As you might guess, the newly created code template is SQL Server specific, because there may be no BEGIN TRANSACTION statement in other databases. At least, there is no such statement in Oracle Database.
You can change the code template target platform property to SQLServer since it is using SQLServer-specific commands to target only the SQLServer platform – then this code template will not appear as an option in the mapping execution view when targeting Oracle, for example.
The above was a simple example of customizing an existing code template – you just added a new JDBC task that would perform a single statement. In other situations, you might need to deal with Jython code that also includes JDBC statements. Incidentally, to familiarize yourself with the structure of a built-in code template, you don’t have to import it. Instead, you could go to the Globals Navigator, and under Public Code Templates, look in the BUILT_IN_CT folder.
If you create custom code templates that you want to include in all projects in a workspace, you can create new code template folders under Public Code Templates and import or copy-and-paste them there.
Figure 7: Including custom code templates in all projects in a workspace.
The necessity of extracting business information from heterogeneous sources is a fact of life in most BI systems. In this article, you looked at how the Code Templates technology that uses JDBC as the connection layer can be used to achieve that purpose in Oracle Warehouse Builder 11g Release 2, for non-Oracle databases. You looked not only at how to use built-in code templates but also how to adjust them if necessary. As getting some experience with code templates, you can take it one step further and create a new code template from scratch.
Yuli Vasiliev is a software developer, freelance author, and consultant currently specializing in open source development, Java technologies, business intelligence (BI), databases, and service-oriented architecture (SOA). He is the author of Oracle Business Intelligence: An introduction to Business Analysis and Reporting (Packt, 2010) as well as a series of other books on the Oracle technology.