This tutorial shows you how to use binary XML storage model, XMLIndex, XQuery, SQL/XML, and many other salient capabilities of Oracle XML DB to aggregate and query unstructured XML data sources.
Note: This OBE was tested on production software however some of the screenshots may still refer to the beta software.
Approximately 1 hour and 30 minutes
This tutorial covers the following topics:
Place
the cursor over this icon to load and view all
the screenshots for this tutorial. (Caution: This action loads all
screenshots simultaneously, so response time may be slow depending on your Internet
connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.
Oracle Database 11g XML DB introduces a new binary XML storage model, a new XMLIndex index, and many other salient new capabilities for high performance storage and retrieval of structured and unstructured XML documents. This tutorial uses a real world use case of an RSS feed aggregator to demonstrate these versatile and high performance capabilities.
Binary XML Storage Model:
Binary XML is a new storage model for abstract data type
XMLType, joining the existing
native storage models of structured (object-relational) and unstructured (CLOB)
storage. Binary XML storage provides more efficient database storage, updating,
indexing, and fragment extraction than unstructured storage. It can provide
better query performance than unstructured storage-it does not suffer from the
XML parsing bottleneck (it is a post-parse persistence model). Like structured
storage, binary XML storage is aware of XML Schema data types and can take advantage
of native database data types. Like unstructured storage, no data conversion
is needed during database insertion or retrieval. Like structured storage, binary
XML storage allows for piecewise updates. Because binary XML data can also be
used outside the database, it can serve as an efficient XML exchange medium,
and you can offload work from the database to increase overall performance in
many cases. Like unstructured storage, binary XML data is kept in document order.
Like structured storage, data and metadata can, using binary storage, be separated
at the database level, for efficiency. Like unstructured storage, however, binary
storage allows for intermingled data and metadata, which lets instance structures
vary. Binary XML storage allows for very complex and variable data, which in
the structured-storage model could necessitate using many database tables and
joins. Unlike the other XMLType
storage models, you can use binary storage for XML schema-based data even if
the XML schema is not known beforehand, and you can store multiple XML schemas
in the same table and query across common elements.
XMLIndex Indexing for Binary XML and Unstructured XML Storage
Models:
B-Tree indexes can be used advantageously with structured storage. They provide
sharp focus by targeting the underlying objects directly. They are generally
ineffective, however, in addressing the detailed structure (elements and attributes)
of an XML document stored in a binary XML or a CLOB
instance. That is the special domain of XMLIndex: binary XML and unstructured
storage models. Unlike a B-Tree index, which you define for a specific column
that represents an individual XML element or attribute, an XMLIndex
index is very general: indexing with XMLIndex
applies to all possible XPath expressions for your XML data. An XMLIndex
index presents the following advantages over other indexing methods:
An XMLIndex index can be used for SQL/XML
functions XMLExists(), XMLTable(),
and XMLQuery(), and it is effective
in any part of a query; it is not limited to use in a WHERE
clause. This is not the case for any of the other kinds of indexes you might
use with XML data.
XMLIndex can thus speed access to SELECT
list data and FROM list data,
making it useful for XML fragment extraction, in particular. Function-based
indexes and CTXXPath indexes
You need no prior knowledge of the XPath expressions that will be used in queries.
XMLIndex is completely general. This is not the case for function-based indexes.
You can use an XMLIndex index
with either XML schema-based or non-schema-based data. It can be used with binary
XML and unstructured storage models. B-Tree indexing is appropriate only for
schema-based data stored object-relationally (structured storage); it is ineffective
for XML schema-based data stored in a binary XML or a CLOB
instance.
You can use an XMLIndex index
for searches with XPath expressions that target collections, that is, nodes
that occur multiple times within a document. This is not the case for functional
indexes.
Oracle Database-Native XQuery:
Since XQuery is now a W3C standard, the IT community has started adopting the
business uses of XML and XQuery. As the innovation leader in commercial database
technology, Oracle Database 11g provides a full-featured native XQuery engine
integrated with the traditional Oracle database server. On the SQL side, the
SQL/XML standard has defined a way to encapsulate XML in SQL and to integrate
the querying of XML using XQuery. This is being accomplished by introducing
new SQL functions: XMLQuery(),
XMLTable(), XMLExists(),
and XMLCast() , which operate
on XML and SQL values using XQuery. Oracle Database 11g enables XQuery support
in the database server through these SQL standard functions. A new XQUERY command
has also been implemented in SQL*Plus to allow users to enter XQuery expressions
on the command line. With standards-based implementation of XQuery in Oracle
Database 11g, application developers can use their favorite APIs (e.g., JDBC,
ODP.NET, and web service) to access Oracle Database XQuery capabilities.
Benefits of Oracle XQuery:
Using SQL/XML XQuery functions along with indexing schemes for structured, unstructured,
and binary XML storage models, XML DB can perform uniform XML queries across
different storage models with orders of magnitude performance improvement over
DOM-based functional evaluation of XML queries. Furthermore, XML queries can
be seamlessly merged with SQL relational queries to handle all query scenarios.
Finally, the XML query capabilities of Oracle XML DB are built on the solid
foundation of industry's best relational database that is highly reliable, available,
scalable, and secure. In short, the XML DB query capabilities in Oracle Database
11g provide the most comprehensive and efficient functionality for versatile,
scalable, concurrent, and high performance XML applications.
Before you perform this tutorial, you should:
| 1. | Install Oracle Database 11g | |
| 2. |
Download and install the latest Firefox browser (version 2.0 or higher) from here Note: Due to a missing DLL file Firefox 3.0 for Linux gets installed but does not open. |
|
| 3. |
Download and install SQL Developer from OTN . |
|
| 4. |
Download and unzip the binxml.zip file into your working directory (i.e.wkdir) |
|
If you are executing this OBE on Linux, set environment variables by performing the following:
| 1. | Open a terminal window
|
|
| 2. |
Set the Environment variable: LD_LIBRARY_PATH.For example, if your ORACLE_HOME is /u01/app/oracle/product/11.1.0/db_1 perform the following: ORACLE_BASE=/u01/app/oracle
|
|
Using the following steps you would create a new user and use the new security mechanism in Oracle Database 11g to grant the new user access to internet.
| 1. |
Connect as user system.
|
|
| 2. |
Open and execute the script 1_1_CreateUser.sql by clicking the Execute script button. In the Select connection window, select system and click OK.
|
|
| 3. | Execute 1_2_httpAccessPriv.sql
to create a new Internet access control list (ACL) and assigned to the new
user. In the Select connection window, select system and click
OK.
|
|
An RSS (Real Simple Syndication) feed is an URL pointing to an XML document in one of the many variants of RSS formats. The XML document of an RSS feed contains news items. In this section, you will create a table to store information about subscribed RSS feeds.
| 1. |
Execute the script 2_1_CreateFeedsTable.sql to create a new rssfeeds table. In the Select connection window, select system and click OK.
|
|
| 2. |
Open and execute the script 2_2_InsertFeeds.sql to insert a list of news feeds into the rssfeeds table. In the Select connection window, select system and click OK.
|
|
| 3. |
Open and execute the script 2_3_VerifyFeeds.sql to verify that a list of news feeds have been inserted. In the Select connection window, select system and click OK.
|
|
Different RSS formats used different XML structures to represent news items. This type of highly variable XML structures are best stored with the binary XML storage model. By taking advantage of the binary XML storage model along with the SECUREFILE LOB storage format, you can gain improved performance while reducing storage space.
| 1. |
Execute the script 3_1_CreateRSSItemsTable.sql to create a new newsItemsTable. In the Select connection window, select system and click OK.
Notice the creation of virtual columns to enforce unique constraints. Also notice how Oracle XML DB can handle different namespaces defined by different RSS formats. |
|
| 2. |
Open 3_2_InsertRSSItems.sql. Depending on whether you are accessing the Internet via a proxy server, modify the script to change the parameter of the utl_http.set_proxy() call. In the Select connection window, select system and click OK.
|
|
| 3. |
You can now execute 3_2_InsertRSSItems.sql to insert news items into the newsItemsTable. In the Select connection window, select system and click OK.
Notice how the script uses the XMLTable() function to extract news items for insertion into the newItemsTable.
|
|
| 4. |
With a unique constraint on the virtual column of news item title, Oracle XML DB can enforce the uniqueness of a news item title. You can execute 3_3_InsertTestItems.sql to insert news items with duplicate item titles to confirm the existence of a unique constraint on news item title. In the Select connection window, select system and click OK.
|
|
With news items stored with binary XML storage model, creating an XMLIndex on news item titles will provide much improved query performance. To allow full text searches on news item titles, a secondary text index can be created on the path table of XMLIndex.
| 1. |
As user RSS, execute the script 4_CreateIndexes.sql to create an XMLIndex using a subset of XPath pointing to news item titles. In the Select connection window, select system and click OK.
A secondary text index is also created. Notice how Oracle XML DB handles multiple namespaces used by different RSS formats. |
|
The main function of a RSS (Real Simple Syndication) aggregator is to aggregate and normalize news items from multiple feeds using diverse RSS formats. In this section, you will see how you can create an RSS view normalizing diverse RSS formats.
| 1. |
Execute the script 5_1_CreateRSSViews.sql to create a relational view newsItemsRelView which normalizes news items using diverse RSS formats. In the Select connection window, select system and click OK.
Notice the UNION ALL clause in the SQL statement for creating this view. Also notice the usage of ora:contains() XQuery function for full text searches on news item titles. An XMLType view newsItemsRSSView is also created to complete the normalization of diverse news feed RSS formats into a single RSS 2.0 format.
|
|
| 2. |
Open and execute 5_2_ExplainRSSViews.sql to see if the SQL/XML queries in the normalized views result in optimized query execution where the XMLIndex and the secondary text index are used. In the Select connection window, select system and click OK.
|
|
| 3. |
Open and execute 5_3_CreateRSSResource.sql to create an Oracle XMLDB repository resource to represent a normalized RSS feed with aggregated and normalized news items from diverse RSS feeds. In the Select connection window, select system and click OK.
|
|
There are several news readers capable of processing RSS 2.0 format. This section uses Firefox 2.0 browser to read the normalized RSS feed.
| 1. |
Create a Sys connection with following details. Connection Name: sys Test and connect to sys
|
|
| 2. |
Right-click on sys connection and select Open SQL Worksheet. Execute the following commands by clicking the Run Script icon. In the Select connection window, select sys and click OK. exec dbms_xdb.sethttpport(8080); select dbms_xdb.gethttpport() from dual;
|
|
| 3. |
You can open a Firefox 2.0 browser and point to the URL http://localhost:8080/home/RSS/news2007.xml. When prompted enter the following details. Username: system
You will see a list of news items aggregated from multiple news feeds.
Note :
|
|
| 1. |
As user system, execute the script cleanup.sql.
|
|
In this tutorial, you learned about the following: