|
XML
Making Smart PL/SQL Scanning of XML Documents
By Steven Feuerstein
In the conclusion of this two-part series, a PL/SQL expert again scans XML and maps its benefits to PL/SQL.
In the July/August issue of Oracle Magazine, I introduced the XML Parser for PL/SQL and demonstrated the steps for parsing XML documents into "information sets" or document trees and then scanning those trees to extract data. That technique roughly corresponds to full table scans, which are not desirable when looking for information in a large data set.
The XML Path Language (XPath), however, allows developers to select subsets of information from XML documents. In this article, I show how to use XPath syntax inside a PL/SQL program to extract data and load it into Oracle database tables.
Requirements
To process XML with PL/SQL, install the XML Parsers for both Java and PL/SQL in your database (Oracle9i or Oracle8i Release 2 or 3). To do this, visit the Oracle Technology Network XML page: otn.oracle.com/tech/xml/. Download and unzip the XDKs (for example, dk_plsql_9_
0_1_0_0.zip, to the directory /plsql_xml) for your platform; then do the following:
1. Go to the /plsql_xml/lib/java subdirectory and run the following commands, in order:
loadjava -verbose resolve -user
<user>/<password> xmlparserv2.jar
and
loadjava -verbose -resolve -user
<user>/<password> xmlpsql.jar
where is the name of the schema that will own the Java classes and <password> is the password for the account.
2. Load the XML Parser for PL/SQL packages by connecting to SQL*Plus and running the xml load.sql file located in the lib/sql directory.
We also use the same sample XML data as in my previous article: documentation of corrections
or errata for my six books on the Oracle PL/SQL language published by O'Reilly and Associates.
XML as Tree
XPath allows you to easily specify and obtain values from an XML document by providing a syntax that reflects the hierarchical structures of that document. The first step in working with an XML document is to parse that document; the output of the parse step is an information set, or infoset, that makes XML data available in a tree structure. Listing 1 shows the XML document for book errata; Figure 1 shows the infoset representation of that XML document.
The infoset consists of a root node ("/"), element nodes, attribute nodes, and text or value nodes. XPath gives you an easy way to zoom right in on a specific node and even allows you to specify predicate or conditional logic to filter out nodes that are not of interest.
FIGURE 1: The Infoset for Errata
To use XPath in PL/SQL, I call the XSLprocessor.selectNodes function, passing it an XML document and the XPath string that specifies the data of interest. Here is a small function that encapsulates the selectNodes function and hides some of the complexities:
FUNCTION selected_nodes (
doc_in IN xmldom.domdocument,
xpath_in IN VARCHAR2
)
RETURN xmldom.domnodelist
IS
retval xmldom.domnodelist;
BEGIN
retval :=
xslprocessor.selectnodes (
xmldom.makenode (doc_in),
xpath_in
);
RETURN retval;
END;
Let's look at some XPath string examples and the data they return. To locate all books in my document for which there are errata, the XPath is:
/errata/book
To skip book-specific information and go directly to the errata entries, the XPath is:
/errata/book/entry
Now suppose I want to obtain the ISBN numbers of all books for which an errata entry was reported by slaksmi@
consulting.com. This XPath does it for me:
/errata/book[entry/Reported_by=
'slaksmi@consulting.com']/@isbn
Here, I specify a filter within square brackets [Expr] and then request that the XPath string return the value of the attribute "isbn". You can think of this filter as an XPath WHERE clause. Attributed are indicated with the use of the "@" prefix.
From XML to Database
Now let's use XPath strings directly inside a PL/SQL program to extract errata information and load it into Oracle tables. Listing 3 shows the CREATE TABLE statements for the two tables needed to store errata.
My loadErrata procedure accepts the name of the file containing the XML document; its executable section is short and to the point.
BEGIN
parse_document (xmlfile, doc);
load_book (doc, l_isbn);
load_errata (doc, l_isbn);
cleanup (doc);
END;
First, I parse the document, returning the infoset. Next, I load the header book information, returning the primary key (ISBN number) for use in loading the errata data. Finally, I clean up, saving changes and releasing memory for the infoset.
Listing 2
Let's look at the code required to load book information, shown in Listing 2. Here's an explanation of the steps by line number:
- Line 10. Use the selected_nodes function to retrieve all the book-level nodes in the XML document ("/errata/book"). This function returns a list of nodes.
- Line 12. Extract the first (index = 0 in XML) entry (node) in the list. In a more general implementation, I would use a numeric FOR loop to access each book in the XML document. See load_errata for an example of such a loop.
- Lines 14-16. Use the XSLprocessor.valueof function to retrieve the value associated with the specified node subelement. By using the "@" prefix, I obtain the value of an attribute. Without @, I can use this same function to retrieve the text values of subnodes in the tree.
- Lines 18-21. Insert the extracted XML data directly into an Oracle database table.
Integrating XML and PL/SQL
Oracle has moved to provide integration with standard XML components through both Java and PL/SQL interfaces. As a result, PL/SQL developers can immediately integrate XML documents and XML-based applications into an Oracle PL/SQL code base.
Steven Feuerstein (steven@stevenfeuerstein.com) is the author of six books on PL/SQL, including Oracle PL/SQL Best Practices and Oracle PL/SQL Programming Guide to Oracle8i Features. Feuerstein is a senior technology advisor for Quest Software and a contributor to RevealNet's Active PL/SQL Knowledge Base.
|