Technology XML
Make XML Native and Relative
By Jonathan Gennick
Oracle XML DB provides native format and relational database access.
XML is fast becoming the language of choice for data interchange between businesses. However, most businesses store their data in relational databases such as Oracle9i Database. So how do you bridge the gap between the hierarchical, document-centric world of XML and the tabular, set-oriented world of relational databases? Do you store your XML documents as files on a file system? Do you shred, pull your XML documents apart, and store the data relationally? Choosing between these two approaches involves weighing the trade-offs based on how you use the data. But what if you didn't have to choose? What if you could take both approaches simultaneously? You can, using a new Oracle9i Database Release 2 feature known as the XML DB Repository.
The Repository Explained
Oracle XML DB is neither a separate product nor a separate option that you must install. Oracle XML DB refers to the collection of XML features and technologies built directly into Oracle9i Database. A key feature is the XML DB Repository. This repository enables you to store XML documents directly in Oracle9i Database Release 2. Once your XML documents are in the repository, you can access your XML data in either an XML-centric or a relational-centric manner.
To store XML data in your database, you simply write an XML document file using FTP, HTTP, or WebDAVall industry-standard protocols. Getting XML data out of your database can be as simple as executing a SQL query or reading a file using one of those same protocols.
Setting the Scene
Imagine that you're in the business of marketing CDs produced by independent artists. You need to exchange information with the major music-store chains, online sites, and the artists themselves. You've just developed the XML document format shown in Listing 1 for describing the contents of a CD, and now you want to leverage the XML DB Repository to store that information in your database. You want easy access to the data from SQL and easy access to the native XML documents. In short, you want the data to be relational and hierarchical. In this article, I'm your DBA, and it's my job to make that happen.
Registering the XML Schema
My first step is to register your XML schema with the XML DB Repository. When I register an XML schema, the repository creates object types and object tables capable of holding instances of that schema. The following call to dbms_xmlschema.registerURI, which I execute from SQL*Plus, retrieves the XML schema shown in Listing 2 from http://gennick.com/cd.xsd and registers it:
BEGIN
dbms_xmlschema.registerURI(
'cd.xsd',
'http://gennick.com/cd.xsd');
END;
/
Note: In addition to CREATE privileges for all the various schema object types, I also need ALTER SESSION and QUERY REWRITE privileges in order to register a schema and create the examples in this article.
Listing 3 shows some of the structures and objects created as a result of registering the CD schema. An XML table named CD331_TAB was created to hold instances of the schema: each CD document in the repository will be represented by one row in this table. I can get a list of such XML tables by querying
the USER_XML_TABLES data dictionary
view. In this case, I simply queried the view before and after registering the schema and looked for the new table name. Each row in CD331_TAB will
contain one instance of type CD327_T, which was created to correspond to our XML schema. The top-level fields in our XML document are represented as attributes of the CD327_T type, and the attribute names match the XML field names. For example, the Title field in the object
type corresponds directly to the Title element in the XML schema. The Songs field corresponds to the Songs element. Songs is a complex element in the XML schema, and as such it's mapped to yet another object type, "Songs328_T". If I issued the SQL*Plus command DESCRIBE "Songs328_T" and continued to drill down into the definition of the Songs field, I'd see that the collection of songs was ultimately implemented as a VARRAY in which each element represented one song.
I can control the object and type names that Oracle9i Database generates when I register a schema; I can also control the specific datatypes used to store my XML data. I do this by annotating the XML schema, using attributes defined by XML DB Repository and part of the oraxdb namespace. Oracle9i Database generates these attributes for me when I don't supply them, and I can easily view what Oracle9i Database generates by looking at the version of the schema stored in the repository. Figure 1 illustrates how conveniently you can access repository data, this time via HTTP, using a standard Web browser. Figure 1 shows part of the CD schema in my repository, and you can see the schema annotations, which are all prefaced by "oraxdb". Note that the URL refers to port 8080, which is the default HTTP port used by the repository.
By default, all objects created when registering a schema will be owned by the user registering the schema. In this case, I own the table and type in Listing 3 and all the other types associated with the CD schema. Because I've registered the schema, any XML files I save to the repository that are instances of the CD schema will be shredded and stored in the CD331_TAB table. The schema and registration are specific to me. CD files saved
by other users will not be stored in my
table. You do have the option, using an
optional parameter to dbms_xmlschema.register Schema, to create a global schema that affects all users, so that any user can save a CD document to the table.
Creating an XML Folder
If I'm going to store CD XML documents in the XML DB Repository, I need a folder in which to put them. To create one, I log in as the SYSTEM user and execute the PL/SQL block found in Listing 4. The call to dbms_xdb.createfolder creates a top-level folder named /CD. The PL/SQL block then uses the dbms_xdb.setAcl procedure to create an access control list (ACL) granting all folder privileges to the owner, which is SYSTEM, and read privileges to all other users. The next step is to issue an UPDATE statement against the repository's RESOURCE_VIEW in order to change ownership of the folder from SYSTEM to GENNICK. It's important to commit after creating a folder; it won't be visible to other sessions until you do. I can now connect as GENNICK using FTP or WebDAV and deposit XML files into the /CD folder.
Saving an XML Document
Once I register the schema and create a folder to hold my XML documents, saving a document to the repository is
as easy as copying a file. Listing 5
shows an FTP session that copies the file LegendsOfTheGreatLakes.xml, shown in Listing 1, to the repository. Port 2100, used in the FTP open command, is the default port used by the repository for FTP sessions. Note that rather than use FTP, I could just as easily have used Windows Copy & Paste, using WebDAV and a Windows Web folder.
Using the RESOURCE_VIEW
An important view that you should be aware of is the view named RESOURCE_VIEW. The RESOURCE_VIEW returns one row for each document or folder in the repository to which you have access. For example, you can get a list of all XML documents under the /CD folder by executing the query shown here:
SELECT any_path
FROM resource_view
WHERE under_path(res,'/CD')=1
AND extractValue(res,
'/Resource/ContentType')='text/xml';
ANY_PATH
-------------------------------
/CD/Gospel/NothingLess.xml
/CD/LegendsOfTheGreatLakes.xml
The new UNDER_PATH function shown above allows you to test whether a given repository resource falls somewhere under a folder (or path) that you specify. In this case, my use of the function restricts query results to resources in the /CD folder and subfolders under /CD. Path-based queries against RESOURCE_VIEW are made efficient by a hierarchical domain index created on the underlying table. This index is part of the repository; you don't need to create it.
|
|
FIGURE 1: Part of the sample CD schema in XML DB Repository
|
The RES column in the resource view does not represent the resource itself, but only the metadata for the resource. Applying the new extractValue function to the RES column examines the content type of each resource. Thus the query results are further restricted to paths pointing to XML documents. The '/Resource/ContentType' syntax represents XPath notation. XPath is a standard notation for specifying parts of an XML document; you'll use it a lot in queries against XML data.
Given a repository path, you can use the new XDBUriType object type to retrieve all or part of the underlying XML document. Listing 6 shows two queries. The first query is an extension to that shown above, adding the use of XDBUriType to retrieve all XML documents under the /CD folder. The second query in Listing 6 is a further refinement that appends standard XPath syntax to the end of the URL in order to extract just the CD titles.
Relational Access to Repository Data
It's also possible to access XML data in the repository by going straight to the underlying table. The underlying table that I created when I registered the CD schema is CD331_TAB. You can write queries directly against this table, but those queries must be XML-aware. To facilitate access to XML data from reporting tools designed for use with relational data, you can create a view such as the one shown in Listing 7. In addition to a view, Listing 7 also creates an index on artist name. The view and index allow me to efficiently execute standard relational queries such as the following:
SELECT title
FROM cd_master
WHERE artist='Carl Behrend';
Updating XML Data
Unfortunately, because all the columns in the cd_master view are based on SQL functions, the view is not updateable. However, it is possible to update XML data in the repository; I just need to update the underlying table created when the schema was registered, as in the following:
UPDATE CD331_TAB cd
SET VALUE(cd) = updateXML(
value(cd),
'/CD/Website/text()',
'http://greatlakeslegends.com/
legends.htm');
Note the use of XPath syntax in this new updateXML function. The path '/CD/Website/text()' specifies that I want to update the text in the CD document's Website field. My third argument to updateXML specifies the new value for that text. This is an in-place update, making it a very efficient operation. XML DB Repository does not need to reconstruct the entire XML document being changed. Because the schema is registered, XML DB Repository is able to rewrite this query in such a way that only the Website attribute in the underlying object structure is touched.
Where Next?
With the XML DB Repository, you can store XML documents in the database and access those documents using standard internet protocols. At the same time, you can access the same XML documents, or just parts of those documents, using standard relational queries. You don't have XML data and relational data; you just have data, period. "XML" and "relational" are merely different paradigms for looking at your data. By separating data from paradigm, Oracle9i protects one of your most important assetsyour datafrom the shifting winds of paradigm changes.
Jonathan Gennick (Jonathan@Gennick.com) is an experienced Oracle DBA and an Oracle Certified Professional. He currently makes his living as a writer and recently completed work on the Oracle SQL*Plus Pocket Reference, Second Edition (O'Reilly & Associates, 2002).