DBA: Data Warehousing & Integration
by Casimir Saternos
Use open source technology to broadcast SQL to several Oracle databases and create an Excel workbook with a separate sheet for each database result set.
Microsoft Excel spreadsheets are ubiquitous in today's business environment. Although Excel can read and display delimited files, the .xls proprietary format provides additional expressive features, including font formatting and multiple sheets per spreadsheet.
Many tools—commercial as well as open source—provide a mechanism for producing reports that Excel can read. For example,
you could use Oracle Reports to produce a spreadsheet or SQL*Plus to spool a comma-delimited file. Access to multiple
datasources is possible through some of the available tools. However, the formatting of the report is tied closely to
the results of a particular query, and these results sometimes differ based upon the datasource. (For instance, the query
SELECT * FROM emp will not necessarily return the same number or types of columns from tables in two different databases.) There are a variety of reasons you could be interested in accessing several different datasources and collating the results in a single spreadsheet:
Depending on the structure and use of particular databases, there are additional reasons why you might produce this type of spreadsheet, but these reasons are dependent upon functional requirements and definitions of particular databases.
In this article, I will describe how to use the Apache Jakarta POI open source project to produce a spreadsheet from several different datasources. Each sheet in the workbook will display the results returned from a given Oracle database. The requirements you will aim to achieve in the current project are as follows:
You could add numerous requirements for a production system. Conspicuously absent from the above list is any mention of password encryption. There is a reasonable attempt to deal with SQL datatypes, but a more robust solution would include handling for special datatypes such as BLOBs (Binary Large Objects), CLOBs (Character Large Objects), and LONGs. There is no user-friendly GUI—just an XML configuration file and a script to call that sets the environment and executes a Java class. Only a single query is executed, so it is expected that it is SQL that is not only syntactically valid but is a query that will return a results set based upon each database's objects. And logging is minimal. Nevertheless, the requirements above will result in a program that is relatively easy to use and that can quickly produce a document containing data from numerous databases. This resulting document can subsequently be formatted using Excel if desired. Because data from different sources is included in the spreadsheet we can easily compare the data extracted from the various databases. In addition, the main mechanisms—the POI application-programming interface (API) calls—that you use will be clearly discerned rather than being obscured by the additional code and resources needed to create a full-featured solution.
Testing and Requirements
The software resulting from this article was written and tested with Java runtime environment (JRE) 1.4.2 on a PC running Windows 2000. (See the sidebar for issues pertaining to JRE 1.5.) Because Java is designed to run on various platforms, it should run successfully on any machine with the appropriate JRE. This version allows you to utilize XML processing without accessing another external API (such as Xerces and Xalan).
JRE 1.5 Workaround
When using this software with jdk1.5.0_04, an exception is thrown (java.lang.NoClassDefFoundError: org/apache/xpath/XPathAPI).
This is because the org.apache.xpath.XPathAPI class was moved in JRE 1.5. There are two ways to solve this problem:
The software is designed to run client-side and access a remote Oracle Server. However, it is important to note that Oracle Database (since version 8) has included a Java Virtual Machine allowing the use of Java objects within the database. This feature provides the flexibility to develop server-side applications that are tightly integrated with the database environment. Using JDBC you can make connections to any remote databases that can be accessed using the JDBC Driver. This approach allows the Oracle Database to serve as more than simply a database; it operates instead as a platform for processing data retrieved from a variety of datasources. The software presented in this article will not take advantage of this functionality, but keep it in mind when adapting the techniques introduced here to your particular requirements.
Initial Configuration
If you have several JREs on your system, you will need to ensure that your PATH environmental variable is set so that the 1.4.2 or later JRE is called rather than an earlier version.
The information in the batch file included with the software download has been adapted for a Linux environment and tested on Red Hat Linux as well. Rather than editing and executing run.bat, modify run.sh appropriately. The spreadsheet can be displayed in a Linux environment using OpenOffice.org's Calc spreadsheet program.
Essentially, the environmental setup relates to the setting of two environmental variables: PATH and CLASSPATH. First, in order to make use of the APIs that provide database access and Excel spreadsheet functionality, the CLASSPATH variable must be set appropriately. The run.bat file sets this variable so that the appropriate Java Archives (.jar files) are accessible. You will need to modify the path to classes12.jar (which contains the Oracle JDBC Driver) so that it points to the correct path on your system. This file is installed with many different Oracle products. If you do not have a copy, you can download it from the URL mentioned above. If your CLASSPATH does not contain this jar, the following exception will occur when the program attempts to load the JDBC Driver:
Driver not found: oracle.jdbc.driver.OracleDriver
java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
The POI jar (poi-2.5.1-final-20040804.jar) is found in the lib folder with the download. If your CLASSPATH does not include this file, the following exception will be thrown when the program attempts to create a workbook object:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/poi/hssf/usermodel/HSSFWorkbook
The broadcast.jar contains the files that are used to pull together the functionality of these APIs and construct a spreadsheet based upon the configured connections and SQL query. The configuration file (config.xml) is expected to reside in the same directory as broadcast.jar is being executed. If it is not, an exception (with the appropriate file system path) will occur:
java.io.FileNotFoundException: C:\config.xml
(The system cannot find the file specified)
XML Configuration
The Java API for XML processing was added to the Java 2 Platform as of version 1.4. This API allows for the processing of XML documents through a standardized set of Java Platform APIs. Therefore, no separate XML processing packages need to be included for the current project. A simple configuration file allows a user to configure multiple database connections and a single SQL query.
XML is an excellent choice for configuration files because it allows the use of human-readable configuration data that can be easily accessed through a standard processing method. Because the amount of data stored in a configuration file is rather limited, there is no concern related to "file bloat" that can occur when XML tags are used to mark up large data sets. In addition, the hierarchical organization of our configuration data benefits from the organization allowed for through nesting entities in XML. Each connection node contains individual pieces of data that, used together, can make a database connection.
Here is an example of the configuration file:
<?xml version="1.0" standalone="yes" ?>
<broadcast>
<connection-config>
<database>
<server-name>server001</server-name>|
<port>1521</port>
<sid>ORCL</sid>
<username>sysdba</username>
<password>my1manager</password>
</database>
<database>
<server-name>server002</server-name>|
<port>1521</port>
<sid>PROD</sid>
<username>sysdba</username>
<password>my1manager</password>
</database>
</connection-config>
<sql-statement>select * from v$parameter</sql-statement>
</broadcast>
The XMLConfigVO class contains the code related to reading and parsing this XML document.
When the no argument constructor is called, the class reads in a file named config.xml and creates a new
DocumentBuilderFactory instance. We are not using DTD (Document Type Definition) or schema validation in this example. The factory produces a Document object that can be parsed or otherwise manipulated as needed.
Next, the nodes associated with the //broadcast/connection-config/database XPath expression are selected.
These nodes contain the elements required to make a database connection: the server name port, Oracle SID, username, and password.
Finally, the SQL statement is obtained by querying using the //broadcast/sql-statement XPath expression.
At this point, all of the data can be retrieved from the XMLConfigVO through
accessors (getter methods with the form getXXXX()).
Using JDBC
Java Database Connectivity (JDBC) technology is a Java API that provides connectivity to Oracle databases to retrieve the data that will be used to populate the spreadsheet. Specifically, the JDBC API makes it possible to establish a connection with the Oracle database, send SQL statements, and process the results. Here the program is only utilizing a portion of the API, as there is no requirement to execute DML or set transactions. Although the full range of database types is supported, you will limit the processing you do in this area to a scope that maps clearly to Excel spreadsheets.
The Oracle JDBC Driver is not dependent on the tnsnames.ora or other standard Oracle client connection information. As long as the JRE is present and the Oracle JDBC Driver classes are in the CLASSPATH (and the database is available over the network), a Java class has the resources needed to make a connection to an Oracle database.
The classes12.jar contains the Oracle-specific JDBC drivers and classes that this project requires. It must be included in the CLASSPATH in order for a user to run the program. You can usually find the file in an <ORACLE_HOME>/jdbc/lib directory, or you can download it from OTN.
The OracleConnectionVO class contains information and processing related to JDBC connections.
The oracle.jdbc.driver.OracleDriver will be a URL in the form
jdbc:oracle:thin:@<serverName>:<Oracle Listener Port>:<Oracle SID> to make the connection.
Jakarata POI
As I explained in a previous article, the Jakarta POI FileSystem APIs implement the OLE 2 Compound Document format in pure Java, and HSSF APIs allow for the reading and writing of Excel files using Java. In that article, you used the API to read Excel files. The current project will be write an Excel file. (POI also provides functionality that lets you open an existing spreadsheet and manipulate it). If you are interested in finding out more about what POI can do, consult the "Busy Developers' Guide to HSSF Features", which provides code snippets that demonstrate the functionality that developers frequently require.
To create a workbook object, call the following (in the BroadcastDriver class):
HSSFWorkbook wb = new HSSFWorkbook();
POISheetGenerator is the class where the call is made to create a sheet and add it to the workbook:
HSSFSheet sheet = wb.createSheet(sheetname);
A font style of bold for column headings is configured as follows:
HSSFFont boldFont = wb.createFont();
boldFont.setBoldweight(HSSFFont.BOLDWEIGHT_BOLD);
HSSFCellStyle boldStyle = wb.createCellStyle();
boldStyle.setFont(boldFont);
The cell style defined above is applied to a particular cell as you iterate through the columns associated with the result set:
row.getCell((short) x).setCellStyle(boldStyle);
To write the workbook to disk, call the following:
OutputStream = new FileOutputStream("output.xls");
wb.write(out);
out.close();
broadcast.jar
This jar contains the classes that comprise the project itself:
BroadcastDriver. This class contains the main method for the application.
You create a new HSSFWorkbook() object that will contain all of the sheets.
Next, you create an XMLConfigVO object that reads the connection and SQL query from the config.xml file.
Using the information contained in this object, the application iterates through each OracleConnectionVO
and sets the JDBC driver, JDBC Database URL, database username, database password, SQL query, and the name of the sheet
(in the form <server name>-<Oracle SID> ). A sheet populated with the data is added to the workbook.
When all connections have been iterated through, the workbook is written to the file system. OracleConnectionVO. This class holds the values of a JDBC Driver, JDBC Database URL, database username,
database password, and SQL query. XMLConfigVO. This class creates a DocumentBuilderFactory
to parse the config.xml document. XPath queries are used to retrieve NodeLists containing the required values:
NodeList nodelist = org.apache.xpath.XPathAPI.selectNodeList(doc, xpath);
getNodeValue(). POISheetGenerator. The populateSheet method of this class does most of the work. After some initial setup that will format column headers in bold, the database connection is opened and the query is executed.
The metadata is retrieved that is needed to determine the number of columns that will be iterated through. The metadata also contains the column names that will be used as the column headers in the sheet being processed. Next, the result set is iterated through. The processing of various datatypes is rather rudimentary in this example. For each object returned, the program attempts to populate the cell as a number (specifically a Java Long). If a
NumberFormatException is thrown (because the value cannot be formatted as a number), the
toString is called on the returned object. The toString function is defined in the
Object object, which is the base of the Java hierarchy, so it is guaranteed to be a valid method regardless of the object returned.
However, the actual value returned by toString can vary as to how meaningful or relevant it is for a given object.Potential Applications
The following are some sample queries that can be included in the config.xml to return the kind of data that a DBA or application developer might want to compare between databases.
The V$DATABASE table contains general information about the database that may be of interest when comparing database configurations or addressing backup and recovery issues. Even though the number and types of fields vary between versions of Oracle, the following query will successfully execute and the results appropriate to the table in question will be displayed:
SELECT * FROM v$database
If a DBA were interested in documenting version information for a number of databases, there might be an interest in the data in the V$VERSION table as well. The following query might be used in this case:
V$VERSION table as well. The following query might be used in this case:
SELECT 'Name: '||name "Database Information" FROM v$database
UNION ALL
SELECT 'DBID: '||dbid FROM v$database
UNION ALL
SELECT * FROM v$version
The files that are of primary interest for backup and recovery operations are control files, redo log files, and the database datafiles themselves. You can retrieve the location of these files as follows:
SELECT * FROM
(
SELECT ' ' "Tablespace" ,Name "File Name" FROM V$CONTROLFILE
union
SELECT ' ', Member FROM V$LOGFILE
union
SELECT Tablespace_Name, File_Name FROM DBA_DATA_FILES
UNION
SELECT ' ' ,'****AS OF: ' ||to_char(sysdate, 'MM/DD/YYYY HH24:MI:SS')||'****' FROM dual
)
ORDER BY 1
Oracle Recovery Manager (RMAN) is commonly used to automate backup and recovery operations. Information related to RMAN that is stored in the database might be of interest when auditing or standardizing backup activities. Queries that might be constructed related to RMAN backups include:
SELECT * FROM V$BACKUP_SET
or
SELECT * FROM V$RMAN_CONFIGURATION
Oracle Applications DBAs have a number of unique challenges related to managing their environments. Those with a number of instances under their care might be interested in comparing the patches that have been applied:
SELECT * FROM APPLSYS.AD_APPLIED_PATCHES ORDER BY PATCH_NAME
These are but a few examples of the possibilities. Any SQL query that is syntactically valid and has corresponding objects in a given database can be executed and the results displayed in a spreadsheet. The examples given here are of interest to a technical audience. You could probably imagine business uses as well that would access application-specific database objects and queries.
Conclusion
Many resources in the Java community allow the rapid creation of programs that benefit from proven APIs. Oracle's JDBC Driver provides a convenient method to access databases using the Java programming language, and the Jakarta programmers have created POI, which makes creating Excel spreadsheets with multiple-sheet formatting and other features a straightforward exercise. Besides providing insight into these technologies, the software described in this artice provides a simple tool that can automate a few more of the day-to-day activities of the overworked DBA.
Casimir Saternos is an Oracle Certified DBA, IBM Certified Enterprise Developer and Sun Certified Java Programmer based in Allentown, Pa.