TECHNOLOGY: XML Exchange
Which Storage XML?
By Sean Dillon
When to use CLOBs or objects to store your XML
The introduction of the Extensible Markup Language (XML) raised questions. How would we share XML if my documents looked different from yours? How could we make XML documents human-readable? How could we store these documents and expect to have any kind of performance when we queried the data in reporting tools or applications?
These questions have been answered over and over again. XML schemas and document type definitions (DTDs) define common formats that businesses can agree on to share XML documents. XML's Extensible Stylesheet Language (XSL) is a standard for transforming XML into Web pages, Portable Document Format (PDF) documents, Scalable Vector Graphics (SVG) charts, and XML documents of a different format. As for how to store XML documents, Oracle Database 10g has answered that question in a way that nothing else in the industry can.
In this column, I'll look at the various ways to store XML documents in Oracle Database 10g.
XML as a CLOB
Before the XMLType datatype was released, in Oracle9i Database Release 1, people were storing their data in VARCHAR2 or character large object ( CLOB ) columns. Because free-form text data was traditionally stored in VARCHAR2 s or CLOB s, it seemed a logical fit for XML as well. For Oracle8i and earlier database releases, these datatypes are fine for storing XML data, because there is no other option out of the box. Once you upgrade to Oracle9i or beyond , however, XMLType is the order of the day.
With the first release of Oracle9i Database, when you stored an instance of the XMLType datatype, the underlying storage model was a CLOB value. The entire document was read and stored as-is in a CLOB object in the database. The CLOB object recorded white space, processing instructions, the XML prolog—every aspect of the document—byte for byte. Oracle9i Database Release 2 introduced structured storage, or object-relational storage, to the XMLType datatype.
This leads us to an obvious question: Why not just continue to use CLOB columns? Because using structured storage brings important features with it. When you store an instance of XMLType in Oracle Database 10g, you have access to XPath querying capability, XSL transformations, XML schema validation, well-formedness checking, and robust searching capability through the use of Oracle Text.
Even today, however, you should consider both CLOB and structured storage architectures when you are designing an XML solution, because the behavior of your application and how you interact with XML will have a huge impact on the performance of your system.
When an XMLType instance is created, an XML syntax check is performed—regardless of the storage architecture—to ensure that the XML document is well formed. You can see the difference between inserting 1,000 CLOB s and 1,000 CLOB -based XMLType values, which, in turn, must be checked for well-formedness during each INSERT :
SQL> create table xmlclobs ( 2 xmlclobs clob ); Table created. SQL> create table xmldocs ( 2 x xmltype ); Table created. SQL> declare 2 l_start timestamp; 3 l_elapsed interval 4 day to second(4); 5 begin 6 l_start := systimestamp; 7 for i in 1 .. 1000 loop 8 insert into xmlclobs 9 values ('<?xml version="1.0"?> 10 <blah/>'); 11 end loop; 12 l_elapsed := systimestamp - l_start; 13 dbms_output.put_line( 14 'CLOB:'||to_char(l_elapsed)); 15 16 l_start := systimestamp; 17 for i in 1 .. 1000 loop 18 insert into xmldocs 19 values (xmltype( 20 '<?xmlversion="1.0"?> 21 <blah/>')); 22 end loop; 23 l_elapsed := systimestamp - l_start; 24 dbms_output.put_line( 25 'XMLType:'||to_char(l_elapsed)); 26 end; 27 / CLOB:+00 00:00:00.1300 XMLType:+00 00:00:02.9440 PL/SQL procedure successfully completed.
This code shows the difference in insertion times; there's an investment in using the XMLType datatype. Based on the results of the INSERT comparison above, if you are going to store XML and plan never to use it, you might choose to use CLOB s instead of XMLType for storage, due to the processing requirements of the well-formedness check. Additionally, if you can't be sure your documents are well formed, you may choose to store your documents as CLOB s. If you plan to ever use XML-related functionality with these documents, however, use XMLType.
Here's a quick overview of the types of applications that can benefit from using CLOB -based XMLType storage:
XML Using Object-Relational Storage
Object-relational storage, instead of storing the entire XML document, byte for byte, in a single CLOB value, breaks the XML document down into scalar values for storage in object attributes in object-relational tables. Consider an XML PersonType schema that contains some simple-typed elements such as FirstName, LastName, and DateOfBirth. These values, when stored in an object-relational XMLType column, might be stored in an object table named PERSON_TYPE with the attributes FIRST_NAME, LAST_NAME , and DATE_OF_BIRTH . The document is shredded upon insertion and reconstituted when you ask the database to return the XML document.
To use object-relational storage, your XML documents must be based on an XML schema that has been registered with the Oracle database. The registration tells the database what kind of object tables to create behind the scenes to store the scalar values contained in XML documents. Because XML schemas are responsible for dictating the format for XML documents, the Oracle database can derive which tables and attributes it needs to create. To register the XML schema, use the DBMS_XMLSCHEMA.REGISTER_SCHEMA procedure. Once the registration process is complete, you will have object tables that closely model the XML schema consumed by the REGISTER_SCHEMA procedure. For example, consider the Person .xsd XML schema.
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <xsd:element name="Person" type="PersonType"/> <xsd:complexType name="PersonType"> <xsd:sequence> <xsd:element name="FirstName"> <xsd:simpleType> <xsd:restriction base= "xsd:string"> <xsd:maxLength value="30"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="LastName"> <xsd:simpleType> <xsd:restriction base= "xsd:string"> <xsd:maxLength value="50"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="DateOfBirth" type="xsd:dateTime"/> </xsd:sequence> </xsd:complexType> </xsd:schema>
This particular schema, when registered with an Oracle database, resulted in a table named "Person162_TAB" and an object type named "PersonType161_T" (Oracle uses an internal algorithm for determining object names during the XML Schema registration process). Listing 1, describes the object-relational table and the object type.
Code Listing 1: Describing the sample object-relational table and object type
SQL> desc "Person162_TAB" Name Null? Type --------------------------------------------------- -------- ------ TABLE of SYS.XMLTYPE(XMLSchema "http://asktom.oracle.com/~sdillon/People.xsd" Element "Person") STORAGE Object-relational TYPE "PersonType161_T" SQL> desc "PersonType161_T" PersonType161_T is NOT FINAL Name Null? Type ---------------------------- ------- ------------------- SYS_XDBPD$ XDB.XDB$RAW_LIST_T FirstName VARCHAR2(30 CHAR) LastName VARCHAR2(50 CHAR) DateOfBirth TIMESTAMP(6)
Listing 1 shows the database storage of the three elements in the XML schema's PersonType complex type as two VARCHAR2 types and a TIMESTAMP type.
This storage architecture is far different from CLOB-based XML storage. I mentioned earlier that the well-formedness check is a processing requirement for loading an XML document into a CLOB -based XMLType column or table. Now consider object-relational-based XMLType columns or tables. Not only does the database need to check the document for well-formedness, but it also has to parse the entire document and load the values into object-relational tables. INSERT performance definitely takes a hit, as shown in Listing 2, which compares a CLOB -based XMLType INSERT and an object-relational XMLType INSERT.
Code Listing 2: INSERT that compares CLOB and object-relational performance
SQL> 'create table xmlclobs ( 2 xmlclobs xmltype ); Table created. SQL> create table xmldocs of xmltype 2 xmlschema "http://asktom.oracle.com/~sdillon/People.xsd" 3 element "Person"; Table created. SQL> declare 2 l_xml xmltype; 3 l_start timestamp; 4 l_elapsed interval 5 day to second(4); 6 begin 7 l_xml := xmltype( 8 '<?xml version="1.0"?> 9 <Person 10 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 11 xsi:noNamespaceSchemaLocation= 12 "http://asktom.oracle.com/~sdillon/People.xsd"> 13 <FirstName>Sean</FirstName> 14 <LastName>Dillon</LastName> 15 <DateOfBirth>1971-05-18</DateOfBirth> 16 </Person>'); 17 18 l_start := systimestamp; 19 for i in 1 .. 1000 loop 20 insert into xmlclobs 21 values ( l_xml ); 22 end loop; 23 l_elapsed := systimestamp - l_start; 24 dbms_output.put_line( 25 'CLOB:'||to_char(l_elapsed)); 26 27 l_start := systimestamp; 28 for i in 1 .. 1000 loop 29 insert into xmldocs 30 values ( l_xml ); 31 end loop; 32 l_elapsed := systimestamp - l_start; 33 dbms_output.put_line( 34 'XMLType:'||to_char(l_elapsed)); 35 end; 36 / CLOB:+00 00:00:00.1700 XMLType:+00 00:00:01.9930 PL/SQL procedure successfully completed.
In this case, the inserts occur after the well-formedness check has taken place on lines 7 through 16 of Listing 2. Therefore, the CLOB -based XML insert is very fast, whereas the object-relational XML insert takes considerably longer, because the document must be broken out into the object-relational table.
CLOB-based XML storage stores the entire document byte for byte. Object-relational storage, on the other hand, enjoys some efficiency in the area of storage size and access to scalar values of the document. Consider the element and attribute names in an XML document. Consider all the markup! In a CLOB, the element name usually appears at least twice, in an open tag and a close tag. Typically, element and attribute names appear much more than twice, because elements and attributes are commonly repeated in XML documents. With object-relational storage, the element and attribute names become the object attribute name, and Oracle's data dictionary stores that name only once. The object-relational approach means no more storage of brackets or of excessive quotation marks. All these bytes are saved, because only the scalar values require storage in the object tables.
Data access is where object-relational storage shines. When you query an XML column or table, the database must decide how it will retrieve the data from the physical data blocks. To query against CLOB-based XML storage (unless queried values exist in indexes), entire documents must be read into memory and a document object model (DOM) must be traversed to find the values to include in a result set. Object-relational storage, however, uses a process called query rewrite, which lets the Oracle optimizer translate an XPath expression into an object-relational query that accesses individual rows of the object tables and lets it read only the attributes necessary for the query. Therefore, object-relational storage tends to enable much better performance in data manipulation language (DML)-heavy applications (when the Oracle optimizer can use query rewrite).
The following types of applications benefit from the use of object-relational XML storage:
Applications Are Storage-Agnostic
No matter which storage model you use, your applications do not have to change to perform INSERT, UPDATE, DELETE, and query operations on your stored XML content. If you write an application against CLOB-based XML storage, it will work exactly the same against object-relational XML storage, although the performance may not be the same.
This benefit comes only as a consequence of using supported data access mechanisms when querying your XML documents. If an application uses extract statements to resolve XPath expressions, that shields the application from changes to the underlying storage architecture. An option for users who store data object-relationally, however, is to retrieve the data directly from the underlying storage objects. In the object-relational storage case earlier in this article, instead of issuing an extract function with the XPath expression '/Person/FirstName/text()', you might directly query the FirstName attribute of the Person162_TAB table, as shown in Listing 3.
Code Listing 3: Querying via XPath and directly
SQL> select extractValue(OBJECT_VALUE, '/Person/FirstName') 2 from xmldocs 3 where rownum < 4; EXTRACTVALUE(OBJECT_VALUE,'/PE -------------------------------------- Sean Sean Sean SQL> SQL> select x."XMLDATA"."FirstName" 2 from xmldocs x 3 where rownum < 4; XMLDATA.FirstName ----------------------- Sean Sean Sean
Although applications can be storage-agnostic, there is a caveat. With object-relational storage, you can access object attributes directly, with potential performance gains, because you won't have to rely on the optimizer's ability to rewrite your query for better efficiency. Note that even as you realize this performance benefit, accessing XML data this way locks your applications into using object-relational storage. If you change your storage from object-relational to CLOB-based, however, these attributes would no longer exist. Even worse, if your XML schema changes over time, these attribute names may change, meaning that you would have to rework your applications to correct these queries.
This is a trade-off people make knowingly. Typically, you should not migrate an application that performs well in object-relational storage to CLOB-based storage, and vice versa.
In this column, I presented the two storage architectures you can use with Oracle's XMLType datatype. There are compelling reasons for both; ultimately the behavior of the application and how it interacts with the XML data will decide which storage is best for you.
The guidelines I provided should be treated as such: guidelines to understand but not hard-and-fast rules. In most cases, you'll want to test each storage model to make sure you're getting the best performance and scalability you can, based on your application and how it interacts with the XML data you are storing.
Sean Dillon (email@example.com) is a principal technologist at Oracle and the lead author of Beginning Oracle Programming (APress). Dillon specializes in core database, XML, and Web services technologies.