DBA: Content Management
   DOWNLOAD
 Oracle Database 11g
   TAGS
xml, 11g, All

Managing Complex XML Data in Oracle XML DB 11g


by V.J. Jain

Learn how to manage complex XML with Oracle XML DB 11g, including how to change schema online

Published December 2007

Over the past few years, XML has emerged as the new standard for data transmission, and its use is becoming more prevalent as companies continue to adopt XML-based solutions. As more organizations begin to enforce XML standards for all data transmission, increasingly complex XML formats are emerging. These complex formats can include multiple namespaces, thousands of elements, and recursive definitions. As the XML documents produced from these formats grow in size and complexity, managing this content has become increasingly challenging, with limited information available on how to address this challenge.

In this article, you will learn how to use the XML DB feature in Oracle Database 11g to manage complex XML content as well as its advantages over commercial ETL products. You will see an example of a complex XML schema that demonstrates the following:

  • Registration of a complex XML schema
  • Insertion of XML files into the database
  • Retrieval of XML data via relational query
  • In-Place Evolution for XML schema modifications
Furthermore, you will get an overview of strategies for maximizing the performance and throughput of Oracle XML DB solutions and practical applications of complex XML formats.

Oracle XML DB Background

Oracle XML DB is a feature of Oracle Database that provides a powerful tool for managing XML content, including storage, manipulation, and retrieval. It offers different storage options to meet the unique requirements of different XML formats. These options include unstructured, binary, and structured storage:

  • Unstructured (character large object, or CLOB). By treating the document as one large object and storing it in the database, this method allows for the best insertion times. However, this storage method also consumes the most space and has the worst performance for relational access to the data. This is an impractical solution for managing large, complex XML documents if relational access is required. Unstructured storage can be a practical solution if disk space is not an issue and the objective is to archive the documents in their original format.
  • Binary storage. This option, new in Oracle Database 11g, stores data in a postparse binary format designed specifically for XML data. This option has several advantages over unstructured storage, in that it is XML-schema-aware, allowing better disk space efficiency and query performance. Although this option offers incredible performance compared to that of unstructured storage, it does not have the same query performance as structured storage. Binary storage is a good option whenever its performance for relational access is acceptable. Because this storage option is easy to use, it is worthwhile to evaluate it prior to opting for structured storage.
  • Structured storage. Also known as schema-based storage, this option uses an object-relational model to store XML documents in the database. This storage option is the most efficient in terms of disk space and relational access. It also has the highest overhead during file insertion and requires additional preparation for schema registration. Structured storage is the best option when optimal relational access is a requirement. For handling very complex and large files with a requirement for efficient relational access, this storage option is usually the best choice.
The perception of the size and complexity of an XML document can differ greatly, depending on the organization. On one hand, for online transaction processing (OLTP) databases using XML for their electronic data interchange (EDI) or other transactional data exchange, a file with several thousand lines might be regarded as a very large file. On the other hand, a multiterabyte data warehouse might regularly process XML documents measured in gigabytes and not consider a file to be large unless it contains millions of lines. The same concept holds for the perceived complexity of an XML document.

For purposes of this article, a document is considered "complex" if it has the following properties:

  • It is single-rooted, with multiple namespaces.
  • It has flexible XML definitions, allowing for great variations while maintaining validity.
  • It has recursive or circular/cyclical references.
  • It has nonstatic XML schemas.
In this article, XML documents are considered "large" if they are single-rooted and are greater than 20MB. These properties introduce certain scalability and management considerations that must be addressed for a robust enterprise solution.

There are no golden rules for choosing the best storage option. Based on the file structure, performance objectives, available resources, and expected volume of data, the best option will vary. If you cannot decide which storage option is the best for your particular requirement, it is worthwhile to try the different formats and determine which is optimal for your specific needs. Generally speaking, if you are dealing with large documents and require relational access, unstructured storage is not acceptable from a performance or resource perspective. Binary XML may be the optimal solution if the query performance is acceptable for the business use or if the business demands the option with the least maintenance time. However, if relational access is the primary objective and users need fast access to any data contained in the XML document, structured storage is most likely the best option.

Maximizing Throughput with Structured Storage

Although there is an overhead cost for inserting files when you are using the structured storage option, you can reduce this cost by splitting large documents into smaller pieces. For example, if there is a 700MB single-rooted XML file, it may be possible to split it into 10 smaller files that are valid with the XML schema. Inserting 10 different 70MB files is much faster than inserting a single 700MB file and achieves the same end result. The level of concurrency should be determined by the available processing power of the database. This strategy takes advantage of the database concurrency and maximizes throughput.

Another consideration when using Oracle XML DB is that the insertion time for a single XML file is generally limited by a single CPU's speed. In other words, having multiple processors does not help the throughput of a single document. For example, consider a situation in which a complex, single-rooted 700MB document needs to be inserted by use of schema-based storage. The total time to insert this file might be 10 minutes with a 1.35GHz processor, whether the database has 12 or 72 CPUs (assuming that at least one CPU is available in each scenario). To increase the throughput of a single XML document, you'll likely need to use faster CPUs. Inserting this same file when you have a single 3.4GHz processor might take 4 minutes. Conversely, when you're dealing with many XML files, having multiple processors can improve the concurrency of insertions, which will improve the throughput on a broader scale. For example, if the 700MB file is split into 10 different 70MB documents, the insertion time for each 70MB document may be less than a minute if you use the database with the 1.35GHz processors. If the CPUs are available, the 10 documents can be inserted concurrently into the database. This strategy results in a total insertion time of about a minute to insert the entire 700MB document.

Unfortunately, calculating these comparisons is not an exact science. Each database may perform differently, depending on several factors, including the operating system, available processing power, memory, and schema definition. The best way to optimize performance in your environment is to implement these strategies and determine the benefits of each of them.

Oracle XML DB Versus Commercial Off-the-Shelf (COTS) ETL Products

Several commercial extract, transform, and load (ETL) tools are available for loading data from files into the database. These tools usually feature a simple front end with drag-and-drop capability and can hide the complexity of the actual process. When you are creating an XML loading process in a commercial ETL tool, it is necessary to define the fields that need to be extracted. If a certain XPath is not specified, the data will not be collected from the document. In the case of complex XML documents, the advantage of Oracle XML DB is that it takes a database-centric view of documents and shreds each document into an object-relational model. When the file is successfully inserted into the database, any data contained within that file is readily accessible without reparsing.

One of the best benefits of Oracle XML DB is that it is a standard feature of Oracle Database and does not require additional licensing. But if licensing fees were not a concern for an organization, why would it use Oracle XML DB for its XML content management instead of a tool from a company with ETL as its primary focus? To understand the answer, it is necessary to understand how Oracle XML DB technology meets the unique requirements of managing very complex XML.

The Challenges of Complex XML Content

When XML content is flexible, frequently changing, recursive, and very large, a developer will undoubtedly encounter certain challenges that might be absent from simpler formats. One possibility with a complex XML document is that its XML schema might be very large and allow a great amount of flexibility while remaining valid. The use of flexible XML schemas is a common strategy for supporting industrywide standards while accommodating company-specific requirements.

For example, consider an XML schema that is adopted as a standard for three companies. In addition to the shared or standard elements, this XML schema will need to include all of the company-specific definitions for each of the companies. To support this requirement, the XML schema is designed with a great deal of flexibility by use of generic container elements that reference all possible company-specific elements and allow most element references to occur zero or more times. As a result, documents with completely different elements are valid with the same XML schema. From a development perspective, this flexibility makes it difficult to write XML parsers to extract the necessary data, because the occurrence of elements is difficult to predict. Because COTS ETL tools and custom parsers require a specific XPath for extracting data, every possible XPath would need to be checked to guarantee the capture of all data from the document. This is an impractical solution, because there is an exponential number of possible XPaths. Furthermore, if recursive or cyclical references exist in the schema, there is an infinite number. When uncaptured data has been detected, the only resolution is to reparse the entire file, a costly operation that should be avoided if possible.

Consider the following XML schema (note the cyclical references and the flexibility of the schema resulting from generic elements):

startData.xsd
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns:bc="startData.xsd" xmlns:xn="standardData.xsd" xmlns:es="CompanySpecific.1.0.xsd"
                              
xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xdb="http://xmlns.oracle.com/xdb" targetNamespace="startData.xsd" elementFormDefault="qualified" attributeFormDefault="unqualified"> <import namespace="standardData.xsd" schemaLocation="standardData.xsd"/> <import namespace="CompanySpecific.1.0.xsd" schemaLocation="CompanySpecific.1.0.xsd"/> <element name="rootElement" xdb:defaultTable="XML_DEFAULT"> <complexType> <sequence> <element name="fileHeader"> <complexType> <attribute name="fileFormat" type="string" use="required"/> <attribute name="companyName" type="string" use="optional"/> </complexType> </element> <element name="fileData" maxOccurs="unbounded"> <complexType> <choice> <element ref="xn:childContainer" maxOccurs="unbounded"/> <element ref="xn:salesInformation" maxOccurs="unbounded"/> </choice> </complexType> </element> <element name="fileFooter"> <complexType> <attribute name="timeStamp" type="string" use="required"/> </complexType> </element> </sequence> </complexType> </element> </schema>
standardData.xsd
<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xn="standardData.xsd" targetNamespace="standardData.xsd" 
  elementFormDefault="qualified" attributeFormDefault="unqualified" xmlns:cs="CompanySpecific.1.0.xsd" >
<import namespace="CompanySpecific.1.0.xsd" schemaLocation="CompanySpecific.1.0.xsd"/>
<element name="childContainer">      
<complexType>
        <sequence>
                <element name="attributes" minOccurs="0">
                <complexType>
                <all>
                        <element name="childLabel" minOccurs="0"/>
                        <element name="childType" minOccurs="0"/>
                </all>
                </complexType>
                </element>
                <choice minOccurs="0" maxOccurs="unbounded">
                        <element ref="xn:childContainer"/>
                        <element ref="xn:CompanySpecificContainer"/>
                        <element ref="xn:salesInformation"/>
                </choice>
        </sequence>
</complexType>
</element>
<element name="salesInformation">    
<complexType>
        <sequence>
                <element name="storeNumber" type="string" minOccurs="0"/>
                <element name="orderNumber" type="string" minOccurs="0"/>
                <element name="salesDate" type="string" minOccurs="0"/>
<element name="product" type="string" minOccurs="0"/>
<element name="quantity" type="string" minOccurs="0"/>
<element name="price" type="string" minOccurs="0"/>
<choice minOccurs="0" maxOccurs="unbounded">
                        <element ref="xn:childContainer"/>
                </choice>
        </sequence>
</complexType>
</element>
<element name="CompanySpecificContainer">
<complexType>
        <sequence>
                <element name="attributes" minOccurs="0">
                <complexType>
                <all>
                        <element name="csDataType" minOccurs="0"/>
                        <element name="csDataFormat" minOccurs="0"/>
                        <element ref="cs:csStore" minOccurs="0"/>
<element ref="cs:csProduct" minOccurs="0"/>
                </all>
                </complexType>
                </element>
                <choice minOccurs="0" maxOccurs="unbounded">
                        <element ref="xn:CompanySpecificContainer"/>
                </choice>
        </sequence>
</complexType>
</element>
</schema>
CompanySpecific.1.0.xsd
<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:cs="CompanySpecific.1.0.xsd" 
  targetNamespace="CompanySpecific.1.0.xsd" elementFormDefault="qualified" attributeFormDefault="unqualified">
<element name="csStore">
    <complexType>
      <sequence>
        <element name="label" type="string" minOccurs="0"/>
        <element name="name" type="string" minOccurs="0"/>
        <element name="value" type="string" minOccurs="0"/>
      </sequence>
    </complexType>
  </element>
<element name="csProduct">
    <complexType>
      <sequence>
        <element name="label" type="string" minOccurs="0"/>
        <element name="name" type="string" minOccurs="0"/>
        <element name="value" type="string" minOccurs="0"/>
      </sequence>
    </complexType>
  </element>
</schema>
This XML schema could be used as an industry standard for transmitting sales and inventory data. Note that in the standardData.xsd namespace, both the childContainer element and the companySpecificContainer element are optionally self-referencing. In this particular definition, this design enables each individual company to decide the granularity of its data using the parent/child relationship. This schema also gives each company the option of including inventory data, sales data, or both. It further enables each company to include zero or more stores, products, and sales, based on its individual needs but within the same flexible format.

For example, if a hypothetical company ABC wanted to include inventory and sales data from multiple stores, it could use a collection of CompanySpecificContainer elements to identify each store (parent) and a collection of CompanySpecificContainer elements to identify the products (children) for each store. One valid document for ABC could be

CompanyABC.xml
<?xml version="1.0"?>
<rootElement xmlns="startData.xsd" xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<fileHeader fileFormat="v1.0" companyName="CompanyABC"/>
<fileData>
<xn:childContainer>
        <xn:attributes>
                <xn:childLabel>Store 0001 Inventory</xn:childLabel>
<xn:childType>Stock</xn:childType>
</xn:attributes>
        <xn:CompanySpecificContainer>
                <xn:attributes>
<xn:csDataType>csStore</xn:csDataType>                                    
<xn:csDataFormat>CompanySpecific.1.0</xn:csDataFormat>
<cs:csStore>
                        <cs:label>Store 0001</cs:label>
                        <cs:name>Product 1</cs:name>
                        <cs:value>In Stock</cs:value>
</cs:csStore>
</xn:attributes>
        <xn:CompanySpecificContainer>        
                        <xn:attributes>
<xn:csDataType>csProduct</xn:csDataType>                          
<xn:csDataFormat>CompanySpecific.1.0</xn:csDataFormat>
<cs:csProduct>
                                <cs:label>Product 1</cs:label>
                                <cs:name>Quantity</cs:name>
                                <cs:value>10</cs:value>
</cs:csProduct>
</xn:attributes>
                </xn:CompanySpecificContainer>
</xn:CompanySpecificContainer>
<xn:salesInformation>
        <xn:storeNumber>Store 0001</xn:storeNumber>
<xn:orderNumber>12345</xn:orderNumber>
        <xn:salesDate>20-SEP-2007</xn:salesDate>
<xn:product>Product 1</xn:product>
<xn:quantity>1</xn:quantity>
<xn:price>110.00</xn:price>
</xn:salesInformation>
</xn:childContainer>
</fileData>
<fileFooter timeStamp="20-SEP-2007"/>
</rootElement>

However, if another company, XYZ, had only a single store and chose to include only standard sales data in its files, it could omit the childContainer element and include a collection of salesInformation elements. A valid document from XYZ describing only standard sales data could be the following:



CompanyXYZ.xml

<?xml version="1.0"?>
<rootElement xmlns="startData.xsd" xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"
                              
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <fileHeader fileFormat="v1.0" companyName="CompanyXYZ"/> <fileData> <xn:salesInformation> <xn:storeNumber>Store 0001</xn:storeNumber> <xn:orderNumber>12345</xn:orderNumber> <xn:salesDate>20-SEP-2007</xn:salesDate> <xn:product>Product 1</xn:product> <xn:quantity>1</xn:quantity> <xn:price>110.00</xn:price> </xn:salesInformation> </fileData> <fileFooter timeStamp="20-SEP-2007"/> </rootElement>
These documents are very different, but both are valid in the XML schema. This design facilitates the use of industrywide standards across different companies, by using a single, flexible format in the XML schema. Furthermore, the use of recursive references for generic container elements enables each company to decide how much detail to include while maintaining compliance with the XML schema. For example, one XPath for extracting quantity-on-hand data from CompanyABC.xml would be
'/rootElement/fileData/xn:childContainer/xn:CompanySpecificContainer/
  xn:CompanySpecificContainer/xn:attributes/cs:csProduct/cs:value'
However, if this company wanted to include store data with a finer granularity, it could add another child element for substores, such as

'/rootElement/fileData/xn:childContainer/xn:CompanySpecificContainer/xn:CompanySpecificContainer/
  xn:CompanySpecificContainer/xn:attributes/cs:csProduct/cs:value'
The design of this XML schema enables the files to be rich with content that is determined by each individual company. By using the generic container elements and optional references, each document can begin to resemble a database of its own. Due to the unpredictability of each file, managing this XML with custom code or a COTS ETL tool would be an intense development effort that would likely require continuous maintenance and support. If any of the companies using this XML schema included a new XPath, the extraction code would need to be modified to capture this data. Any files parsed without the updated XPaths would need to be completely reprocessed. This example demonstrates the substantial challenges of complex XML schemas and the difficulties for developers tasked with managing such content.

The Oracle XML DB Solution

This problem with XPath mapping, as demonstrated in the above example, does not exist with Oracle XML DB, because the entire file is stored in the database. As soon as the document has been inserted, the content from the document is immediately available for query. Regardless of how flexible the XML schema might be, any data contained in the document can be accessed with the appropriate XPath. This provides an unmatched advantage that maximizes availability and minimizes the cost of maintenance.

To implement the XML DB solution, start by registering the XML schema (the definition files and documents are in a directory called XML_TEST):

begin
DBMS_XMLSCHEMA.REGISTERSCHEMA(
        schemaurl => 'CompanySpecific.1.0.xsd',
        schemadoc => BFILENAME ('XML_TEST','CompanySpecific.1.0.xsd')
        );
DBMS_XMLSCHEMA.REGISTERSCHEMA(
        schemaurl => 'standardData.xsd',
        schemadoc => BFILENAME ('XML_TEST','standardData.xsd')
        );
DBMS_XMLSCHEMA.REGISTERSCHEMA(
        schemaurl => 'startData.xsd',
        schemadoc => BFILENAME ('XML_TEST','startData.xsd')
        );                              
end;    
/
Now that you have created the object-relational structure for our XML schema, you are ready to insert XML files into the default table (specified in the annotation of the root element).
insert into XML_DEFAULT values (XMLTYPE(BFILENAME ('XML_TEST','CompanyABC.xml'),nls_charset_id('AL32UTF8')));
/
The file is successfully inserted in a fraction of a second. Immediately the data is available for relational access. Here is an example of a query access that describes the current inventory:

SELECT extractValue(object_value,'/rootElement/fileFooter/@timeStamp')  start_date, 
extractValue(object_value,'/rootElement/fileHeader/@companyName') companyName, 
extractValue(object_value,'/rootElement/fileHeader/@fileFormat') fileFormat,
extractValue(value(b),'/xn:childContainer/xn:attributes/xn:childLabel', 
   'xmlns:xn="standardData.xsd"') childLabel,
extractValue(value(b),'/xn:childContainer/xn:attributes/xn:childType', 
   'xmlns:xn="standardData.xsd"') childType,
extractValue(value(c),'/xn:CompanySpecificContainer/xn:attributes/xn:csDataType', 'xmlns:xn="standardData.xsd"') csDataType,
extractValue(value(c),'/xn:CompanySpecificContainer/xn:attributes/xn:csDataFormat', 
   'xmlns:xn="standardData.xsd"') csDataFormat,
extractValue(value(c),'/xn:CompanySpecificContainer/xn:attributes/cs:csStore/cs:label', 
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csStoreLabel,
extractValue(value(c),'/xn:CompanySpecificContainer/xn:attributes/cs:csStore/cs:name', 
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csStoreName,
extractValue(value(c),'/xn:CompanySpecificContainer/xn:attributes/cs:csStore/cs:value', 
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csStoreValue,
extractValue(value(c),'/xn:CompanySpecificContainer/xn:CompanySpecificContainer/xn:attributes/xn:csDataType', 
   'xmlns:xn="standardData.xsd"') csDataTypeL2,
extractValue(value(d),'/xn:CompanySpecificContainer/xn:attributes/xn:csDataFormat', 
   'xmlns:xn="standardData.xsd"') csDataFormatL2,
extractValue(value(d),'/xn:CompanySpecificContainer/xn:attributes/cs:csProduct/cs:label', 
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csProductLabel,
extractValue(value(d),'/xn:CompanySpecificContainer/xn:attributes/cs:csProduct/cs:name', 
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csProductName,
extractValue(value(d),'/xn:CompanySpecificContainer/xn:attributes/cs:csProduct/cs:value', 
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csProductValue
from 
XML_DEFAULT a
,TABLE(XMLSequence(Extract(object_value, '/rootElement/fileData/xn:childContainer', 
   'xmlns:xn="standardData.xsd" xmlns="startData.xsd"'))) b
,TABLE(XMLSequence(Extract(value(b), '/xn:childContainer/xn:CompanySpecificContainer', 
   'xmlns:xn="standardData.xsd"'))) c
,TABLE(XMLSequence(Extract(value(c), '/xn:CompanySpecificContainer/xn:CompanySpecificContainer', 
   'xmlns:xn="standardData.xsd"'))) d
/

...
                              
snipped...
                            
This solution allows for the management of any documents that conform to the XML schema definition. Oracle XML DB's database-centric view of XML documents enables an XML document to be content-rich while providing a layer of abstraction from the complexities of development that would be unavoidable with any other ETL technology.

This example demonstrates the unmatched XML storage-and-retrieval capabilities of Oracle XML DB, but a complex XML schema is also characterized as having frequent changes to the XML schema definition. How does Oracle XML DB facilitate changes to the XML schema?

Managing XML Schema Changes

In the previous example, schema flexibility was provided by use of generic container elements and optional references to company-specific elements. Although this design offers substantial flexibility within the XML schema, there is a possibility for changes in the CompanySpecific.1.0.xsd namespace if individual companies choose to include additional data in their files.

Changes to the schema definitions have always been problematic for XML-based solutions. This is expected, because an XML schema is supposed to be used to validate an XML document. If the document contains elements that are not defined in the schema, the document is not valid. In the past, this has been a weakness of Oracle XML DB, because changes to registered XML schemas required a complex and costly operation using the copyEvolve procedure. This procedure locks the resources, creates temporary tables, moves all the data from the XML table for this schema into the temporary tables, applies the schema changes, and then moves the data back to the XML table. The more data in the XML table, the longer and more expensive this operation. Depending on the size of the table, it might take hours to complete a schema evolution to add a single element or attribute. With respect to complex XML, this procedure was not an adequate solution for managing schema changes.

To address this limitation, Oracle has introduced in Oracle XML DB 11g a new procedure called inPlaceEvolve, which allows the same schema modifications to be an online operation that does not require data movement. Instead, this procedure modifies the database objects created during the schema registration while leaving the related data in place. This enhancement is crucial in addressing the frequent definition changes common with complex XML schemas.

Using our previous example, consider a situation in which an additional element was added to the csProduct definition:

CompanySpecific.1.1.xsd
<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:cs="CompanySpecific.1.0.xsd" 
 targetNamespace="CompanySpecific.1.0.xsd" elementFormDefault="qualified"
                              
attributeFormDefault="unqualified"> <element name="csStore"> <complexType> <sequence> <element name="label" type="string" minOccurs="0"/> <element name="name" type="string" minOccurs="0"/> <element name="value" type="string" minOccurs="0"/> </sequence> </complexType> </element> <element name="csProduct"> <complexType> <sequence> <element name="label" type="string" minOccurs="0"/> <element name="name" type="string" minOccurs="0"/> <element name="value" type="string" minOccurs="0"/> <element name="class" type="string" minOccurs="0"/> </sequence> </complexType> </element> </schema>

With Oracle XML DB 11g, this modification is fast and requires minimal resources. The new procedure requires the schema URL and an XMLType document (XMLDiff) that conforms to the xdiff XML schema. The XMLDiff document is a specially formatted document that reflects the changes in the XML schema. Instead of your having to manually create the XMLDiff document, it can be done automatically with the xmldiff function in Oracle Database.

First, create a new schema file with the updated XML schema definition CompanySpecific.1.1.xsd. Then use the Oracle Database xmldiff function with the old schema as the first parameter and the new schema as the second parameter:

var oldSchemaDoc clob;
var newSchemaDoc clob;
 begin
     :oldSchemaDoc := xmltype( bfilename ( 'XML_TEST', 'CompanySpecific.1.0.xsd') ,
                              
nls_charset_id('AL32UTF8') ).getClobVal(); :newSchemaDoc := xmltype( bfilename ( 'XML_TEST', 'CompanySpecific.1.1.xsd') ,
nls_charset_id('AL32UTF8') ).getClobVal(); end; /
You can view the XMLDiff document by using
select xmldiff(xmltype(:oldSchemaDoc),xmltype(:newSchemaDoc)).getClobVal() from dual;
/
For simplicity, use a CLOB variable to store the XMLDiff document.
var diffXMLDoc clob;
begin
      select xmldiff(xmltype(:oldSchemaDoc),xmltype(:newSchemaDoc)).getClobVal() into :diffXMLDoc from dual;
end;
/ 
Now that the XMLDiff document is available, the inPlaceEvolve procedure can be called to modify the XML schema online.
SQL> alter session set events = '31150 trace name context forever, level 0x200000';

Session altered.

SQL> 
SQL> BEGIN
  2  DBMS_XMLSCHEMA.inPlaceEvolve('CompanySpecific.1.0.xsd',  xmltype(:diffXMLDoc) );
  3  END;
  4  /

PL/SQL procedure successfully completed.
An examination of the trace file shows that the database type that represents the modified complex element was altered to include the new element:
change to ct  sqltype = csProduct1046_T
 ------------ QMTS Executing SQL ------------ 
ALTER TYPE "XMLTEST"."csProduct1046_T" ADD ATTRIBUTE "class" VARCHAR2(4000 CHAR) CASCADE NOT INCLUDING TABLE DATA 
/
 --------------------------------------------
The introduction of this new feature makes it possible to efficiently manage changes to XML schemas, even with a large volume of documents, without requiring a window of unavailability during the schema evolution. This is a major enhancement that improves the feasibility of using Oracle XML DB for large enterprise solutions.

Practical Applications

Oracle XML DB 11g offers a complete, efficient content management solution that is far more useful and practical than any of the alternatives. By taking a database-centric view of each document, Oracle XML DB provides an innovative and powerful method for storing and retrieving XML content. The various storage options provide the ability to efficiently manage all documents, regardless of size and complexity. The new feature for efficiently modifying XML schema definitions makes Oracle XML DB an ideal solution for frequently changing content. These features meet the challenges of managing complex XML content and provide a solution that deserves serious consideration by all organizations.

Although the use of complex XML described in this article might appear to be quite complicated, it is nevertheless emerging as the standard in several industries. For example, in the telecommunications industry, wireless companies are already using XML schemas similar to the ones demonstrated in this article for distributing transmission metrics (see 3GPP). Considering the great value and flexibility that complex XML offers and Oracle XML DB's unprecedented ability to efficiently manage this content, it is likely that this use of XML will become more prevalent as more decision-makers become aware of the full potential of Oracle XML DB 11g.


V.J. Jain is a principal Oracle Applications and Oracle Database consultant at Varun Jain Inc. Additional material by him can be found at Oracle-Developer.com.

Send us your comments