Create an Excel Report from Several Oracle Databases Using Apache Jakarta POI

DBA: Data Warehousing & Integration

by Casimir Saternos

Use open source technology to broadcast SQL to several Oracle databases and create an Excel workbook with a separate sheet for each database result set.

Microsoft Excel spreadsheets are ubiquitous in today's business environment. Although Excel can read and display delimited files, the .xls proprietary format provides additional expressive features, including font formatting and multiple sheets per spreadsheet.

Many tools—commercial as well as open source—provide a mechanism for producing reports that Excel can read. For example, you could use Oracle Reports to produce a spreadsheet or SQL*Plus to spool a comma-delimited file. Access to multiple datasources is possible through some of the available tools. However, the formatting of the report is tied closely to the results of a particular query, and these results sometimes differ based upon the datasource. (For instance, the query SELECT * FROM emp will not necessarily return the same number or types of columns from tables in two different databases.) There are a variety of reasons you could be interested in accessing several different datasources and collating the results in a single spreadsheet:

  • If you are a DBA, you may want to compare the configuration of several different databases.
  • If you are an application developer, you could be assigned a task to check the version of PL/SQL objects in several different databases.
  • During the process of migrating data from one database to another, a DBA may need to estimate the percent of the job completed.
  • When debugging an application running against a test database, a programmer may need to compare data as it exists in the production environment.

Depending on the structure and use of particular databases, there are additional reasons why you might produce this type of spreadsheet, but these reasons are dependent upon functional requirements and definitions of particular databases.

In this article, I will describe how to use the Apache Jakarta POI open source project to produce a spreadsheet from several different datasources. Each sheet in the workbook will display the results returned from a given Oracle database. The requirements you will aim to achieve in the current project are as follows:

  • The software will be run to create a spreadsheet containing data from one or more Oracle databases.
  • A user will define a single arbitrary SQL query to be processed by each configured database.
  • A user will define one or more database connections.
  • The database connections and the SQL query defined by the user will be included in a configuration file.
  • The configuration file will be in XML format.
  • One sheet will be created to display the results from each database.
  • Each sheet will be named based upon the server name and Oracle System Identifier (otherwise known as the Oracle SID).
  • The spreadsheet will display column headers (based upon the database column) in bold.
  • The software will be written in Java.
  • Databases will be accessed using JDBC.
  • The spreadsheet will be produced using Jakarta POI.

You could add numerous requirements for a production system. Conspicuously absent from the above list is any mention of password encryption. There is a reasonable attempt to deal with SQL datatypes, but a more robust solution would include handling for special datatypes such as BLOBs (Binary Large Objects), CLOBs (Character Large Objects), and LONGs. There is no user-friendly GUI—just an XML configuration file and a script to call that sets the environment and executes a Java class. Only a single query is executed, so it is expected that it is SQL that is not only syntactically valid but is a query that will return a results set based upon each database's objects. And logging is minimal. Nevertheless, the requirements above will result in a program that is relatively easy to use and that can quickly produce a document containing data from numerous databases. This resulting document can subsequently be formatted using Excel if desired. Because data from different sources is included in the spreadsheet we can easily compare the data extracted from the various databases. In addition, the main mechanisms—the POI application-programming interface (API) calls—that you use will be clearly discerned rather than being obscured by the additional code and resources needed to create a full-featured solution.

Testing and Requirements

The software resulting from this article was written and tested with Java runtime environment (JRE) 1.4.2 on a PC running Windows 2000. (See the sidebar for issues pertaining to JRE 1.5.) Because Java is designed to run on various platforms, it should run successfully on any machine with the appropriate JRE. This version allows you to utilize XML processing without accessing another external API (such as Xerces and Xalan).

JRE 1.5 Workaround

When using this software with jdk1.5.0_04, an exception is thrown (java.lang.NoClassDefFoundError: org/apache/xpath/XPathAPI). This is because the org.apache.xpath.XPathAPI class was moved in JRE 1.5. There are two ways to solve this problem:

  • 1. Utilize a jar that containts the XPathAPI class (xalan-2.4.1.jar for instance). Add this jar to the CLASSPATH. The same type of approach can be used with older JREs that do not include the XPathAPI class. The disadvantage is that you are including functionality that is already available in the JRE.
  • 2. Alter the source code to remove the package qualification. The reference to org.apache.xpath.XPathAPI is changed to simply XPathAPI. (It is now found in com.sun.org.apache.xpath.internal.) This approach requires a separate jar to download, and is the option implemented for JRE 1.5 here.

The software is designed to run client-side and access a remote Oracle Server. However, it is important to note that Oracle Database (since version 8) has included a Java Virtual Machine allowing the use of Java objects within the database. This feature provides the flexibility to develop server-side applications that are tightly integrated with the database environment. Using JDBC you can make connections to any remote databases that can be accessed using the JDBC Driver. This approach allows the Oracle Database to serve as more than simply a database; it operates instead as a platform for processing data retrieved from a variety of datasources. The software presented in this article will not take advantage of this functionality, but keep it in mind when adapting the techniques introduced here to your particular requirements.

Initial Configuration

If you have several JREs on your system, you will need to ensure that your PATH environmental variable is set so that the 1.4.2 or later JRE is called rather than an earlier version.

The information in the batch file included with the software download has been adapted for a Linux environment and tested on Red Hat Linux as well. Rather than editing and executing run.bat, modify run.sh appropriately. The spreadsheet can be displayed in a Linux environment using OpenOffice.org's Calc spreadsheet program.

Essentially, the environmental setup relates to the setting of two environmental variables: PATH and CLASSPATH. First, in order to make use of the APIs that provide database access and Excel spreadsheet functionality, the CLASSPATH variable must be set appropriately. The run.bat file sets this variable so that the appropriate Java Archives (.jar files) are accessible. You will need to modify the path to classes12.jar (which contains the Oracle JDBC Driver) so that it points to the correct path on your system. This file is installed with many different Oracle products. If you do not have a copy, you can download it from the URL mentioned above. If your CLASSPATH does not contain this jar, the following exception will occur when the program attempts to load the JDBC Driver:


 
 
 Driver not found: oracle.jdbc.driver.OracleDriver 
 java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
 
 

The POI jar (poi-2.5.1-final-20040804.jar) is found in the lib folder with the download. If your CLASSPATH does not include this file, the following exception will be thrown when the program attempts to create a workbook object:


 
 Exception in thread "main" java.lang.NoClassDefFoundError:
 org/apache/poi/hssf/usermodel/HSSFWorkbook
 

The broadcast.jar contains the files that are used to pull together the functionality of these APIs and construct a spreadsheet based upon the configured connections and SQL query. The configuration file (config.xml) is expected to reside in the same directory as broadcast.jar is being executed. If it is not, an exception (with the appropriate file system path) will occur:


 
 
 java.io.FileNotFoundException: C:\config.xml 
 (The system cannot find the file specified)
 
 

XML Configuration

The Java API for XML processing was added to the Java 2 Platform as of version 1.4. This API allows for the processing of XML documents through a standardized set of Java Platform APIs. Therefore, no separate XML processing packages need to be included for the current project. A simple configuration file allows a user to configure multiple database connections and a single SQL query.

XML is an excellent choice for configuration files because it allows the use of human-readable configuration data that can be easily accessed through a standard processing method. Because the amount of data stored in a configuration file is rather limited, there is no concern related to "file bloat" that can occur when XML tags are used to mark up large data sets. In addition, the hierarchical organization of our configuration data benefits from the organization allowed for through nesting entities in XML. Each connection node contains individual pieces of data that, used together, can make a database connection.

Here is an example of the configuration file:


 
 
 <?xml version="1.0" standalone="yes" ?> 
 <broadcast> 
 <connection-config> 
 <database> 
 <server-name>server001</server-name>| 
 <port>1521</port> 
 <sid>ORCL</sid> 
 <username>sysdba</username> 
 <password>my1manager</password> 
 </database> 
 <database> 
 <server-name>server002</server-name>| 
 <port>1521</port> 
 <sid>PROD</sid> 
 <username>sysdba</username> 
 <password>my1manager</password> 
 </database> 
 </connection-config> 
 <sql-statement>select * from v$parameter</sql-statement>
 </broadcast>
 

The XMLConfigVO class contains the code related to reading and parsing this XML document. When the no argument constructor is called, the class reads in a file named config.xml and creates a new DocumentBuilderFactory instance. We are not using DTD (Document Type Definition) or schema validation in this example. The factory produces a Document object that can be parsed or otherwise manipulated as needed.

Next, the nodes associated with the //broadcast/connection-config/database XPath expression are selected. These nodes contain the elements required to make a database connection: the server name port, Oracle SID, username, and password. Finally, the SQL statement is obtained by querying using the //broadcast/sql-statement XPath expression. At this point, all of the data can be retrieved from the XMLConfigVO through accessors (getter methods with the form getXXXX()).

Using JDBC

Java Database Connectivity (JDBC) technology is a Java API that provides connectivity to Oracle databases to retrieve the data that will be used to populate the spreadsheet. Specifically, the JDBC API makes it possible to establish a connection with the Oracle database, send SQL statements, and process the results. Here the program is only utilizing a portion of the API, as there is no requirement to execute DML or set transactions. Although the full range of database types is supported, you will limit the processing you do in this area to a scope that maps clearly to Excel spreadsheets.

The Oracle JDBC Driver is not dependent on the tnsnames.ora or other standard Oracle client connection information. As long as the JRE is present and the Oracle JDBC Driver classes are in the CLASSPATH (and the database is available over the network), a Java class has the resources needed to make a connection to an Oracle database.

The classes12.jar contains the Oracle-specific JDBC drivers and classes that this project requires. It must be included in the CLASSPATH in order for a user to run the program. You can usually find the file in an <ORACLE_HOME>/jdbc/lib directory, or you can download it from OTN.

The OracleConnectionVO class contains information and processing related to JDBC connections. The oracle.jdbc.driver.OracleDriver will be a URL in the form jdbc:oracle:thin:@<serverName>:<Oracle Listener Port>:<Oracle SID> to make the connection.

Jakarata POI

As I explained in a previous article, the Jakarta POI FileSystem APIs implement the OLE 2 Compound Document format in pure Java, and HSSF APIs allow for the reading and writing of Excel files using Java. In that article, you used the API to read Excel files. The current project will be write an Excel file. (POI also provides functionality that lets you open an existing spreadsheet and manipulate it). If you are interested in finding out more about what POI can do, consult the "Busy Developers' Guide to HSSF Features", which provides code snippets that demonstrate the functionality that developers frequently require.

To create a workbook object, call the following (in the BroadcastDriver class):


 
 HSSFWorkbook wb = new HSSFWorkbook(); 
 
 

POISheetGenerator is the class where the call is made to create a sheet and add it to the workbook:


 
 
 HSSFSheet sheet = wb.createSheet(sheetname); 
 
 

A font style of bold for column headings is configured as follows:


 
 
 HSSFFont boldFont = wb.createFont();
 boldFont.setBoldweight(HSSFFont.BOLDWEIGHT_BOLD);
 HSSFCellStyle boldStyle = wb.createCellStyle();
 boldStyle.setFont(boldFont);
 

The cell style defined above is applied to a particular cell as you iterate through the columns associated with the result set:


 
 row.getCell((short) x).setCellStyle(boldStyle);
 
 

To write the workbook to disk, call the following:


 
 
 OutputStream = new FileOutputStream("output.xls");
 wb.write(out);
 out.close();
 

broadcast.jar

This jar contains the classes that comprise the project itself:

  • BroadcastDriver. This class contains the main method for the application. You create a new HSSFWorkbook() object that will contain all of the sheets. Next, you create an XMLConfigVO object that reads the connection and SQL query from the config.xml file. Using the information contained in this object, the application iterates through each OracleConnectionVO and sets the JDBC driver, JDBC Database URL, database username, database password, SQL query, and the name of the sheet (in the form <server name>-<Oracle SID> ). A sheet populated with the data is added to the workbook. When all connections have been iterated through, the workbook is written to the file system.
  • OracleConnectionVO. This class holds the values of a JDBC Driver, JDBC Database URL, database username, database password, and SQL query.
  • XMLConfigVO. This class creates a DocumentBuilderFactory to parse the config.xml document. XPath queries are used to retrieve NodeLists containing the required values:
    
     
     
     NodeList nodelist = org.apache.xpath.XPathAPI.selectNodeList(doc, xpath);
     
     
    When the appropriate node has been located, you can retrieve its value by calling getNodeValue().
  • POISheetGenerator. The populateSheet method of this class does most of the work. After some initial setup that will format column headers in bold, the database connection is opened and the query is executed.

    The metadata is retrieved that is needed to determine the number of columns that will be iterated through. The metadata also contains the column names that will be used as the column headers in the sheet being processed. Next, the result set is iterated through. The processing of various datatypes is rather rudimentary in this example. For each object returned, the program attempts to populate the cell as a number (specifically a Java Long). If a NumberFormatException is thrown (because the value cannot be formatted as a number), the toString is called on the returned object. The toString function is defined in the Object object, which is the base of the Java hierarchy, so it is guaranteed to be a valid method regardless of the object returned. However, the actual value returned by toString can vary as to how meaningful or relevant it is for a given object.

Potential Applications


The following are some sample queries that can be included in the config.xml to return the kind of data that a DBA or application developer might want to compare between databases.

The V$DATABASE table contains general information about the database that may be of interest when comparing database configurations or addressing backup and recovery issues. Even though the number and types of fields vary between versions of Oracle, the following query will successfully execute and the results appropriate to the table in question will be displayed:


 
 SELECT * FROM v$database
 

If a DBA were interested in documenting version information for a number of databases, there might be an interest in the data in the V$VERSION table as well. The following query might be used in this case:


 
 V$VERSION table as well. The following query might be used in this case:
 SELECT 'Name: '||name "Database Information" FROM v$database 
 UNION ALL 
 SELECT 'DBID: '||dbid FROM v$database 
 UNION ALL 
 SELECT * FROM v$version
 
 

The files that are of primary interest for backup and recovery operations are control files, redo log files, and the database datafiles themselves. You can retrieve the location of these files as follows:


 
 SELECT * FROM 
 (
 SELECT ' ' "Tablespace" ,Name "File Name" FROM V$CONTROLFILE
 union
 SELECT ' ', Member FROM V$LOGFILE
 union
 SELECT Tablespace_Name, File_Name FROM DBA_DATA_FILES
 UNION 
 SELECT ' ' ,'****AS OF: ' ||to_char(sysdate, 'MM/DD/YYYY HH24:MI:SS')||'****' FROM dual
 )
 ORDER BY 1
 
 

Oracle Recovery Manager (RMAN) is commonly used to automate backup and recovery operations. Information related to RMAN that is stored in the database might be of interest when auditing or standardizing backup activities. Queries that might be constructed related to RMAN backups include:


 
 SELECT * FROM V$BACKUP_SET 
 or
 SELECT * FROM V$RMAN_CONFIGURATION
 
 

Oracle Applications DBAs have a number of unique challenges related to managing their environments. Those with a number of instances under their care might be interested in comparing the patches that have been applied:


 
 
 SELECT * FROM APPLSYS.AD_APPLIED_PATCHES ORDER BY PATCH_NAME
 
 

These are but a few examples of the possibilities. Any SQL query that is syntactically valid and has corresponding objects in a given database can be executed and the results displayed in a spreadsheet. The examples given here are of interest to a technical audience. You could probably imagine business uses as well that would access application-specific database objects and queries.

Conclusion

Many resources in the Java community allow the rapid creation of programs that benefit from proven APIs. Oracle's JDBC Driver provides a convenient method to access databases using the Java programming language, and the Jakarta programmers have created POI, which makes creating Excel spreadsheets with multiple-sheet formatting and other features a straightforward exercise. Besides providing insight into these technologies, the software described in this artice provides a simple tool that can automate a few more of the day-to-day activities of the overworked DBA.

Casimir Saternos is an Oracle Certified DBA, IBM Certified Enterprise Developer and Sun Certified Java Programmer based in Allentown, Pa.