Technology XML
XML: To CLOB or Object?
By Sean Dillon
With XML DB, choose when to store CLOBs or objects.
In the January/February 2003 issue of Oracle Magazine, Jonathan Gennick wrote about XML DB and explained some of its capabilities offered in Oracle9i Database Release 2. Gennick referred to the two different storage architectures of the new datatype, XMLType: CLOB (Character Large Object) and structured (or object-relational). In Oracle9i Database Release 1, the only option for storing XML data was CLOB-based storage. In Release 2 of Oracle9i Database, you now have the option to store XML object-relationally, according to the structure of your XML Schema.
So why choose one storage type over the other? As always, the answer to this question is, "It depends." Oracle outlines some preliminary guidelines for what type of features you might be looking for in your XML storage in the Oracle9i XML Database Developer's GuideOracle XML DB. Chapter four, "Using XMLType," contains a section titled "Pros and Cons of XML Storage Options in Oracle XML DB." The section includes the information shown in Table 1.
Table 1 provides some general guidelines as to which storage mechanism you should use, depending on your application's behavior. In this article, I will quantify some of the guidelines offered in Table 1. I'll also take a look at some of the things you can and can't do, depending on the storage architecture you select.
Sample Data to Work With
In order to analyze the characteristics of XML DB and to look at the capabilities realized through the two different storage architectures, I'll need some sample data. I am using two sets of sample XML documents, each of which will give me a different storage architecture once I put the data in the database:
Document Set 1: XML Schema-based XML using object types to store the data object-relationally.
Document Set 2: Non-XML Schema-based XML storage, which defaults to CLOB-based storage.
You can download
this sample data at
asktom.oracle.com/~sdillon/xml/samplexml.zip.
Register the XML Schema
I need to register the XML Schema with the database in order to create the SQL types and tables that will store my XML documents' contents. There are a variety of ways to register an XML Schema in Oracle Database, but for my example I will use a BFILE.
I extract the samplexml.zip file into my /tmp directory, which extracts the XML Schema and all of the sample data I need for my tests to the /tmp/xml directory (with associated subdirectories). I'll use the following code to read the XML Schema from the file system and subsequently register it with XML DB:
SQL> create directory xmldir as '/tmp/xml';
Directory created.
SQL> declare
2 l_bfile bfile;
3 begin
4 l_bfile := bfilename(
'XMLDIR', 'po.xsd');
5 dbms_lob.open(l_bfile);
6 dbms_xmlschema.registerschema(
7 'http://asktom.oracle.com/
~sdillon/xsd/purchaseOrder.xsd',
8 l_bfile);
9 dbms_lob.close(l_bfile);
10 end;
11 /
PL/SQL procedure successfully completed.
Now the XML Schema is registered, but more important, the table and objects used to store XML conforming to this XML Schema are created. Take a look at the details of the XML Schema. I can describe the objects that were created during the XML Schema registration process as shown in Listing 1.
If I compare the elements of the PurchaseOrderType to the SQL object type XDBPO_TYPE, I see that each element is mapped to either a scalar value or another SQL object type (in the case of an element that is a complex type). This shows that Oracle generated a SQL object type based on the structure of the XML Schema! The next logical question is, How did Oracle know what to create to store my XML?
XML DB Schema Attributes
In Oracle9i XML DB, XML Schemas are used not only to validate XML documents but also to dictate how the database stores the contents of those documents. When I registered my XML Schema, Oracle created the PURCHASEORDER table, which is responsible for storing the content of my XML documents. XML DB Schema attributes help administrators and developers decide how that storage will be structured. I can do things such as specify names of SQL object types or names of tables where the XML content will be loaded. For instance, I can set
the following attributes:
DefaultTable. This specifies the name of the table into which the XML will be stored.
SQLName. This specifies the name of the attribute within the object that maps to this XML element.
SQLType. This specifies the name of
the SQL type corresponding to this XML element.
SQLSchema. This specifies the name of the database user that will own the SQLType.
MaintainDOM. If this attribute is true, the XML storage will maintain DOM fidelity. This means things such as comments, processing instructions, and namespace declarations are stored, as well as the order of the elements in the document. This guarantees that the returned XML documents are identical to the original XML documents for DOM traversals.
XML DB annotations enable overrides of the certain XML element attribute values and mappings between XML Schema datatypes and SQL datatypes. I used a number of these annotations in my purchaseOrder.xsd file. Below are some of the attributes used that will create annotations and instruct Oracle9i Database on how to store the data:
n <xs:schema xmlns:xs="http://www.w3. org/2001/XMLSchema"
xmlns:xdb="http:// xmlns:oracle.com/xdb" version="1.0"
xdb:storeVarrayAsTable="true">
This is the namespace declaration for the XML DB attributes I will add. Notice the storeVarrayAsTable attribute. This is set to true because I want collections of elements to be stored in what is known as an Ordered Collection in the table.
n <xs:element name="PurchaseOrder" type="PurchaseOrderType"
xdb:defaultTable="PURCHASEORDER"/>
XML documents validating against purchaseOrder.xsd will be stored in the PURCHASEORDER table.
n <xs:complexType name=
"PurchaseOrderType" xdb:SQLType= "XDBPO_TYPE">
SQLType="XDBPO_TYPE" tells Oracle9i Database to specify the SQL type name to be "XDBPO_TYPE."
n <xs:element name="ActionsType" xdb:SQLName="ACTIONS">
This element is a child of PurchaseOrderType. Therefore, this declaration states that the actions element in the XDBPO_TYPE SQL type will be named "ACTIONS."
You can find a complete list of the available attributes that you can specify in XML Schemas, along with their descriptions, in Chapter 5 of the Oracle9i XML Database Developer's GuideOracle XML DB.
Keep in mind that annotations specified in these XML Schemas are optional. There are default values for most of the attributes, and for those that don't have default values (SQL object names, default table names, and so on), values are determined when the XML Schema is registered. Annotations are meant to give administrators and developers granular control over how XML content is stored. More important, the XML DB Schema annotations are nonintrusive. This means you can add one or more XML DB annotations to your XML Schema and your XML documents do not have to change.
Preparing to Load the XML
I use the same procedures to load the CLOB-based storage and object-relational storage folders into my repository. I use a simple Java stored procedure to read the filenames from the directory where the files live in the operating system, and then I use BFILES to read the XML documents from the file system and insert them into Oracle Database. Once I have the content of the file via the BFILE mechanism, I insert the filename and file content into tables. Before I can run the insert test, I use the code in Listing 2 to create new directories from which to read the XML documents, create a CLOB using BFILE, create a temporary table, create a Java stored procedure to get and insert the files into the test tables, and create the two test tables for the two different XML content storage modelsnon-XML Schema and XML Schema.
Loading the XML
Now that I have created the two test tables for loading my XML content, I will load my sample data. I'll use simple PL/SQL blocks to accomplish this. The first table I'll load is the XML Schema-based table, which should show me the worst-case load performance scenario:
SQL> set timing on
SQL> begin
2 -- load our temporary table w/ the
filenames in the clobData dir
3 get_dir_list('/tmp/xml/strucData');
4 for xmlfiles in (select *
5 from dir_list)
6 loop
7 insert into structured_xml(
filename, po)
8 values (xmlfiles.filename,
9 xmltype.createxml
(getdocument('STRUCXML',
10 xmlfiles.filename)));
11 end loop;
12 end;
13 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:09.48
Six hundred seventy-eight XML documents were loaded in approximately nine and a half seconds. Not too bad at all. This is a fairly complex transaction, because each XML document that was loaded had to be parsed, loaded into a SQL object, and inserted into the PURCHASEORDER object table. Now that I know what it takes to load the structured storage, I'll load the CLOB-based storage:
SQL> begin
2 -- load our temporary table w/ the
filenames in the clobData dir
3 get_dir_list('/tmp/xml/clobData');
4 for xmlfiles in (select *
5 from dir_list)
6 loop
7 insert into clob_xml
(filename, po)
8 values (xmlfiles.filename,
9 xmltype.createxml(
getdocument('CLOBXML',
10 xmlfiles.filename)));
11 end loop;
12 end;
13 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:06.06
This took just over six seconds. This is about 66 percent of the structured storage load time, meaning I can get about 150 percent of the work done in the same amount of time. Now, timing alone is not sufficient to really understand the impact of these loads, so I used SQL Trace and TKPROF to get some detailed statistics about what was happening behind the scenes during these loads. Listing 3 shows a summary of the results of these tests for both storage-type loads.
From the numbers in Listing 3, I see that loading CLOB-based storage is much less resource-intensive. The nonrecursive statements are comparable for both storage types, but the recursive statements (SQL that executes behind the scenes, on your behalf) are worlds apart. This is because, for structured storage, Oracle Database had to load up objects and object tables. This loading just requires more work.
If your application is getting ("ingesting") a large amount of XML and there are no other requirements to sway your decision on how the XML should be stored, the obvious answer is to use CLOB-based XML storage.
Querying the XML
At this point, I have two tables with identical sets of XML loaded. The nice thing about this is that the two tables use different storage architectures, so I can perform some query tests to see how Oracle Database queries perform on the different types of storage.
I will test different types of queries to get an appreciation for the most appropriate storage architecture, given an application's requirements. I'll perform some XPath-based[ queries, and then I'll query the tables for entire XML documents.
The first query is XPath-based:
select extractValue
(po,'/PurchaseOrder/Reference')
from <table_name>
where existsNode(po,'/PurchaseOrder/
LineItems/LineItem[@ItemNumber="1" and
Part/@Id="715515009058"]') = 1
/
Effectively, this query asks for the reference numbers for all the purchase orders that have a line item ordering a particular part. In Listing 4, I'll run this query once for each table.
Even though the access times aren't that long, Listing 4 shows that the CLOB-based storage took 1.79 seconds, whereas the structured storage took .53 seconds. Even more significant is the number of physical reads that took place. The CLOB-based storage query used 757 physical reads instead of the much more attractive 0 physical reads in the structured-based storage. This is because the CREATE TABLE clob_xml statement did not specify any storage parameters for the PO (XMLType) column. By default, Oracle Database does not cache LOBs in the buffer cache and , in order to read the CLOB_XML data, the query must perform direct reads. Depending on your application's behavior, you may or may not want to cache your LOBs in your buffer cache.
Conclusion
Ultimately, the decision to store your XML as CLOBs or as structured storage is really a question of the application's needs. I run a set of tests just like I've done in this article, which tells me definitively which one will be better
for me in my application. I encourage you to do the same.
Sean Dillon (sean.dillon@oracle.com) is a principal technologist with Oracle. Dillon specializes in core database, XML, and Web services technologies.
|
Table 1: Pros and cons of XML storage options in Oracle XML DB
|
| Feature | LOB Storage | Structured Storage (object-relational storage) |
| Database schema flexibility | Very flexible when schemas change. | Limited flexibility for schema
changes. Similar to the ALTER TABLE restrictions. |
| Data integrity and accuracy | Maintains the original XML byte for byteimportant in some applications. | Trailing new lines, whitespace within tags, and data format for nonstring datatypes is lost. But maintains Document Object Model (DOM) fidelity. |
| Performance | Mediocre performance for data manipulation language (DML). | Excellent DML performance. |
| Access to SQL | Some accessibility to SQL features. | Good accessibility to existing SQL features, such as constraints and indexes. |
| Space needed | Can consume considerable space. | Needs less space than CLOB storage, in particular when used with an Oracle XML DB-registered XML Schema. |