Using Binary XML and XMLIndex to Aggregate and Query Unstructured XML Data Sources

Purpose

This tutorial shows you how to use binary XML storage model, XMLIndex, XQuery, SQL/XML, and many other salient capabilities of Oracle XML DB to aggregate and query unstructured XML data sources.

Time to Complete

Approximately 1 hour and 30 minutes

Topics

This tutorial covers the following topics:

 Overview
 Prerequisites
 Setting Environment Variables.
 Creating a new user and grant necessary privileges
 Creating a table to store information about RSS feeds
 Creating a table to store RSS news items
 Creating an XMLIndex on News Items
 Creating an RSS View with Aggregated and Normalized News Items from Diverse News Feeds
 Reading the Normalized RSS Feed with a Firefox 2.0 Browser
 Cleaning up the machine
 Summary

Viewing Screenshots

 Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.

Overview

Oracle Database 11g XML DB introduces a new binary XML storage model, a new XMLIndex index, and many other salient new capabilities for high performance storage and retrieval of structured and unstructured XML documents. This tutorial uses a real world use case of an RSS feed aggregator to demonstrate these versatile and high performance capabilities.

Binary XML Storage Model:
Binary XML is a new storage model for abstract data type XMLType, joining the existing native storage models of structured (object-relational) and unstructured ( CLOB) storage. Binary XML storage provides more efficient database storage, updating, indexing, and fragment extraction than unstructured storage. It can provide better query performance than unstructured storage-it does not suffer from the XML parsing bottleneck (it is a post-parse persistence model). Like structured storage, binary XML storage is aware of XML Schema data types and can take advantage of native database data types. Like unstructured storage, no data conversion is needed during database insertion or retrieval. Like structured storage, binary XML storage allows for piecewise updates. Because binary XML data can also be used outside the database, it can serve as an efficient XML exchange medium, and you can offload work from the database to increase overall performance in many cases. Like unstructured storage, binary XML data is kept in document order. Like structured storage, data and metadata can, using binary storage, be separated at the database level, for efficiency. Like unstructured storage, however, binary storage allows for intermingled data and metadata, which lets instance structures vary. Binary XML storage allows for very complex and variable data, which in the structured-storage model could necessitate using many database tables and joins. Unlike the other XMLType storage models, you can use binary storage for XML schema-based data even if the XML schema is not known beforehand, and you can store multiple XML schemas in the same table and query across common elements.

XMLIndex Indexing for Binary XML and Unstructured XML Storage Models:
B-Tree indexes can be used advantageously with structured storage. They provide sharp focus by targeting the underlying objects directly. They are generally ineffective, however, in addressing the detailed structure (elements and attributes) of an XML document stored in a binary XML or a CLOB instance. That is the special domain of XMLIndex: binary XML and unstructured storage models. Unlike a B-Tree index, which you define for a specific column that represents an individual XML element or attribute, an XMLIndex index is very general: indexing with XMLIndex applies to all possible XPath expressions for your XML data. An XMLIndex index presents the following advantages over other indexing methods:

An XMLIndex index can be used for SQL/XML functions XMLExists(), XMLTable(), and XMLQuery(), and it is effective in any part of a query; it is not limited to use in a WHERE clause. This is not the case for any of the other kinds of indexes you might use with XML data.
XMLIndex can thus speed access to SELECT list data and FROM list data, making it useful for XML fragment extraction, in particular. Function-based indexes and CTXXPath indexes
You need no prior knowledge of the XPath expressions that will be used in queries. XMLIndex is completely general. This is not the case for function-based indexes.
You can use an XMLIndex index with either XML schema-based or non-schema-based data. It can be used with binary XML and unstructured storage models. B-Tree indexing is appropriate only for schema-based data stored object-relationally (structured storage); it is ineffective for XML schema-based data stored in a binary XML or a CLOB instance.
You can use an XMLIndex index for searches with XPath expressions that target collections, that is, nodes that occur multiple times within a document. This is not the case for functional indexes.

Oracle Database-Native XQuery:
Since XQuery is now a W3C standard, the IT community has started adopting the business uses of XML and XQuery. As the innovation leader in commercial database technology, Oracle Database 11g provides a full-featured native XQuery engine integrated with the traditional Oracle database server. On the SQL side, the SQL/XML standard has defined a way to encapsulate XML in SQL and to integrate the querying of XML using XQuery. This is being accomplished by introducing new SQL functions: XMLQuery(), XMLTable(), XMLExists(), and XMLCast() , which operate on XML and SQL values using XQuery. Oracle Database 11g enables XQuery support in the database server through these SQL standard functions. A new XQUERY command has also been implemented in SQL*Plus to allow users to enter XQuery expressions on the command line. With standards-based implementation of XQuery in Oracle Database 11g, application developers can use their favorite APIs (e.g., JDBC, ODP.NET, and web service) to access Oracle Database XQuery capabilities.

Benefits of Oracle XQuery:
Using SQL/XML XQuery functions along with indexing schemes for structured, unstructured, and binary XML storage models, XML DB can perform uniform XML queries across different storage models with orders of magnitude performance improvement over DOM-based functional evaluation of XML queries. Furthermore, XML queries can be seamlessly merged with SQL relational queries to handle all query scenarios. Finally, the XML query capabilities of Oracle XML DB are built on the solid foundation of industry's best relational database that is highly reliable, available, scalable, and secure. In short, the XML DB query capabilities in Oracle Database 11g provide the most comprehensive and efficient functionality for versatile, scalable, concurrent, and high performance XML applications.

Back to Topic List

Prerequisites

Before you perform this tutorial, you should:

1. Install Oracle Database 11g
2.

Download and install the latest Firefox browser (version 2.0 or higher) from here

Note: Due to a missing DLL file Firefox 3.0 for Linux gets installed but does not open.

3.

Download and install SQL Developer from OTN .

4.

Download and unzip the binxml.zip file into your working directory (i.e.wkdir)

Back to Topic List

Setting Environment Variables

If you are executing this OBE on Linux, set environment variables by performing the following:

1. Open a terminal window

 

2.

Set the Environment variable: LD_LIBRARY_PATH.For example, if your ORACLE_HOME is /u01/app/oracle/product/11.1.0/db_1 perform the following:

ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=$ORACLE_BASE/product/11.1.0/db_1
ORACLE_SID=orcl
PATH=$ORACLE_HOME/bin:$PATH
LD_LIBRARY_PATH=$ORACLE_HOME/lib

 

Back to Topic List

Creating a new user and grant necessary privileges

Using the following steps you would create a new user and use the new security mechanism in Oracle Database 11g to grant the new user access to internet.

1.

Connect as user system.

  1. Open SQL Developer.
  2. In the Connections tab select New Connection.

  1. Create a new connection system with the following details:

    Connection Name: system
    Username: system
    Password: oracle
    Check the Save Password checkbox
    Role: default

    Test and connect to system

 

2.

Open and execute the script 1_1_CreateUser.sql by clicking the Execute script button. In the Select connection window, select system and click OK.

 

3. Execute 1_2_httpAccessPriv.sql to create a new Internet access control list (ACL) and assigned to the new user. In the Select connection window, select system and click OK.

 

Back to Topic List

Creating a table to store information about RSS feeds

An RSS (Real Simple Syndication) feed is an URL pointing to an XML document in one of the many variants of RSS formats. The XML document of an RSS feed contains news items. In this section, you will create a table to store information about subscribed RSS feeds.

1.

Execute the script 2_1_CreateFeedsTable.sql to create a new rssfeeds table. In the Select connection window, select system and click OK.

2.

Open and execute the script 2_2_InsertFeeds.sql to insert a list of news feeds into the rssfeeds table. In the Select connection window, select system and click OK.

 

3.

Open and execute the script 2_3_VerifyFeeds.sql to verify that a list of news feeds have been inserted. In the Select connection window, select system and click OK.

 

Back to Topic List

Creating a table to store RSS news items

Different RSS formats used different XML structures to represent news items. This type of highly variable XML structures are best stored with the binary XML storage model. By taking advantage of the binary XML storage model along with the SECUREFILE LOB storage format, you can gain improved performance while reducing storage space.

1.

Execute the script 3_1_CreateRSSItemsTable.sql to create a new newsItemsTable. In the Select connection window, select system and click OK.

Notice the creation of “virtual columns” to enforce unique constraints. Also notice how Oracle XML DB can handle different namespaces defined by different RSS formats.

2.

Open 3_2_InsertRSSItems.sql. Depending on whether you are accessing the Internet via a proxy server, modify the script to change the parameter of the utl_http.set_proxy() call. In the Select connection window, select system and click OK.

 

3.

You can now execute 3_2_InsertRSSItems.sql to insert news items into the newsItemsTable. In the Select connection window, select system and click OK.

Notice how the script uses the XMLTable() function to extract news items for insertion into the newItemsTable.

 

4.

With a unique constraint on the virtual column of news item title, Oracle XML DB can enforce the uniqueness of a news item title. You can execute 3_3_InsertTestItems.sql to insert news items with duplicate item titles to confirm the existence of a unique constraint on news item title. In the Select connection window, select system and click OK.

 

Back to Topic List

Creating an XMLIndex on News Items

With news items stored with binary XML storage model, creating an XMLIndex on news item titles will provide much improved query performance. To allow full text searches on news item titles, a secondary text index can be created on the path table of XMLIndex.

1.

As user RSS, execute the script 4_CreateIndexes.sql to create an XMLIndex using a subset of XPath pointing to news item titles. In the Select connection window, select system and click OK.

A secondary text index is also created. Notice how Oracle XML DB handles multiple namespaces used by different RSS formats.

Back to Topic List

Creating an RSS View with Aggregated and Normalized News Items from Diverse News Feeds

The main function of a RSS (Real Simple Syndication) aggregator is to aggregate and normalize news items from multiple feeds using diverse RSS formats. In this section, you will see how you can create an RSS view normalizing diverse RSS formats.

1.

Execute the script 5_1_CreateRSSViews.sql to create a relational view newsItemsRelView which normalizes news items using diverse RSS formats. In the Select connection window, select system and click OK.

Notice the UNION ALL clause in the SQL statement for creating this view. Also notice the usage of ora:contains() XQuery function for full text searches on news item titles. An XMLType view newsItemsRSSView is also created to complete the normalization of diverse news feed RSS formats into a single RSS 2.0 format.

 

2.

Open and execute 5_2_ExplainRSSViews.sql to see if the SQL/XML queries in the normalized views result in optimized query execution where the XMLIndex and the secondary text index are used. In the Select connection window, select system and click OK.

 

3.

Open and execute 5_3_CreateRSSResource.sql to create an Oracle XMLDB repository resource to represent a normalized RSS feed with aggregated and normalized news items from diverse RSS feeds. In the Select connection window, select system and click OK.

 

Back to Topic List

Reading the Normalized RSS Feed with a Firefox 2.0 Browser

There are several news readers capable of processing RSS 2.0 format. This section uses Firefox 2.0 browser to read the normalized RSS feed.

1.

Create a Sys connection with following details.

Connection Name: sys
Username: sys
Password: Password for sys
Check the Save Password checkbox
Role: sysdba

Test and connect to sys

 

2.

Right-click on sys connection and select Open SQL Worksheet. Execute the following commands by clicking the Run Script icon. In the Select connection window, select sys and click OK.

exec dbms_xdb.sethttpport(8080);

select dbms_xdb.gethttpport() from dual;

 

3.

You can open a Firefox 2.0 browser and point to the URL http://localhost:8080/home/RSS/news2007.xml . When prompted enter the following details.

Username: system
Password: oracle

You will see a list of news items aggregated from multiple news feeds.

Note :

  • The screen shot will be different from what a user gets due to changes in the news.
  • Firefox 2.0 has built-in capability to automatically transform files in RSS format. If you do not have Firefox 2.0, the output will be displayed in form of an XML document.

Back to Topic List

Cleanup

1.

As user system, execute the script cleanup.sql.

Back to Topic List

Summary

In this tutorial, you learned about the following:

Back to Topic List

 Place the cursor over this icon to hide all screenshots.

Left Curve
Popular Downloads
Right Curve
Untitled Document