This tutorial covers how to use binary XML storage model, XMLIndex, XQuery, SQL/XML, and many other salient capabilities of Oracle XML DB to aggregate and query unstructured XML data from diverse Web 2.0 data sources.
Approximately 40 minutes.
Facebook, Twitter, YouTube, blogs, news, and other social media feeds are available as either ATOM 1.0 or RSS (Really Simple Syndication) 2.0 semistructured XML data. This tutorial covers how Oracle XML DB can store, normalize, index, query, search, transform, and aggregate ATOM 1.0 and RSS 2.0 feeds from diverse data sources. It covers the following topics:
Note: For additional infomation on RSS and ATOM, see the following web sites:
The following is a list of software requirements:
Before starting this tutorial, you should first complete the following steps:
. |
Install Oracle Database 11g Release 2 (11.2.0.2.0) Enterprise Edition for linux |
|---|---|
. |
Install Oracle SQL Developer |
. |
Download the files.zip file which contains the script files used in this OBE to your local drive |
. |
Run the setup.sql script that is contained in the files.zip file. This setup file creates the required tablesspaces, verifies that your database character set is AL32UTF8, and sets the required ports. |
. |
Check your Firefox browser version 3.5 (linux installations) or Windows Internet Explorer 8 (for MS-Windows installations). |
Oracle Database 11g Release 2 XML DB supports a binary XML storage model, structured and unstructured component of XMLIndex index, partitioning of binary XML storage model, and locally managed structured and unstructured XMLIndex. This hands-on lab uses the real world use case of an RSS feed aggregator to demonstrate these versatile and high performance capabilities.
To create a new user and use the new security mechanism in Oracle Database 11g Release 2 to grant the new user access to internet, perform the following steps:
. |
Click the SQL Developer icon on your desktop to start SQL Developer.
The SQL Developer window is displayed.
|
|---|---|
. |
Create a new database connection as user "system" by performing the following steps. a. In the Connections tab, right-click the Connections node, and then select New Connection from the pop-up menu.
The New/Select Database Connection window is displayed.
b. Enter the following details in the New/Select Database Connection window. Connection Name: System
c. Click Test. If the connection has been set correctly, the Status in the lower left corner of the window is Success.
d. Click Save to save the newly created database connection, and then click Connect. The new System database connection is displayed in the Connections tab.
e. Click the Files tab, and then double-click the 1_1_CreateUser.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area. Notes:
f. Click the Run Script (F5) icon to run the script that creates rss user and grants some privileges to the new user.
The Select Connection window is displayed. Select the System connection, and then click OK.
The output is displayed in the Script Output tab.
|
. |
Open and execute the 1_2_httpAccessPriv.sql script as user system to create a new Internet access control list (ACL) and assign it to the new rss user. a. Click the Files tab, and then double-click the 1_2_httpAccessPriv.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area.
b. Click the Run Script (F5) icon to run the script.
The Select Connection window is displayed. Select the System connection, and then click OK.
The output is displayed in the Script Output tab.
|
. |
Open and execute the setup.sql script as user system to do the following:
a. Click the Files tab, and then double-click the setup.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area.
b. Click the Run Script (F5) icon to run the script.
c. The Select Connection window is displayed. Select the System connection, and then click OK.
The output is displayed in the Script Output tab.
|
. |
Create a new database connection as user rss by performing the following steps. a. In the Connections tab, right-click the Connections node, and then select New Connection from the pop-up menu.
The New/Select Database Connection window is displayed.
b. Enter the following details in the New/Select Database Connection window. Connection Name: rss
c. Click Test. If the connection has been set correctly, the Status in the lower left corner of the window is Success.
d. Click Save to save the newly created database connection, and then click Connect. The new rss database connection is displayed in the Connections tab.
|
|---|---|
. |
Open and execute the 2_1_CreateFeedsTable.sql script as user rss to create a new rssfeeds table. a. Click the Files tab, and then double-click the 2_1_CreateFeedsTable.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area.
b. Click the Run Script (F5) icon to run the script.
c. The Select Connection window is displayed. Select the rss connection, and then click OK.
The output is displayed in the Script Output tab.
|
. |
Open and execute the 2_2_InsertFeeds.sql script as user rss to insert a list of RSS feeds into the rssfeeds table. a. Click the Files tab, and then double-click the2_2_InsertFeeds.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area.
Note: The 2_2_InsertFeeds.sql script contains several sample feeds which are commented out. In this tutorial, we are only inserting the Oracle Blogs feed that are found at the end of this script. At the time of developing this tutorial, most of the feeds worked. If you try some of the sample feeds and they don't work, that indicates that the feed is no longer available.
b. Click the Run Script (F5) icon to run the script.
c. The Select Connection window is displayed. Select the rss connection, and then click OK.
The output is displayed in the Script Output tab.
|
. |
Open and execute the 2_3_VerifyFeeds.sql script as user rss to verify that a list of RSS feeds has been inserted. a. Click the Files tab, and then double-click the 2_3_VerifyFeeds.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area.
b. Highlight the select statement, and then click the Run Statement icon to run the statement.
c. The Select Connection window is displayed. Select the rss connection, and then click OK.
The output is displayed in the Query Result tab.
|
Different RSS formats use different XML structures to represent feed items. This type of highly varied XML structures is best stored with the binary XML storage model. By taking advantage of the binary XML storage model along with the SECUREFILE LOB storage format, you can gain improved performance while reducing storage space.
. |
Open and execute the 3_1_CreateRSSItemsTable.sql script as user rss to create the newsItemsTable table. Note the following:
a. Click the Files tab, and then double-click the 3_1_CreateRSSItemsTable.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area.
Each row in a partitioned table is unambiguously assigned to a single partition. The partitioning key consists of one or more columns that determine the partition where each row is stored. Oracle automatically directs insert, update, and delete operations to the appropriate partition with the partitioning key. Each partition has a VALUES LESS THAN clause, which specifies a non-inclusive upper bound for the partitions. Any values of the partitioning key equal to or higher than this literal are added to the next higher partition. All partitions, except the first, have an implicit lower bound specified by the VALUES LESS THAN clause of the previous partition. A MAXVALUE literal can be defined for the highest partition. MAXVALUE represents a virtual infinite value that sorts higher than any other possible value for the partitioning key, including the NULL value. In the preceding code example, range partitions are created by using a date-based virtual column partitioning key, vc_date. Any inserted rows with a date that is less than May 31, 2011 will be inserted in the items_stale partition in the PT01 tablespace. All other inserted rows will be inserted into the items_fresh partition in the PT02 tablespace. b. Click the Run Script (F5) icon to run the script.
c. The Select Connection window is displayed. Select the rss connection, and then click OK.
The output is displayed in the Script Output tab.
|
|---|---|
. |
Open the 3_2_InsertRSSItems.sql script. Depending on whether you are accessing the Internet via a proxy server, modify the script to change the parameter of the utl_http.set_proxy() call. a. Click the Files tab, and then double-click the 3_2_InsertRSSItems.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area.
b. Edit the script and select one of the following which you need to access RSS feeds on the Internet:
Note: In the above example, we used the CALL utl_http.set_proxy('www-proxy.us.oracle.com:80') statement. Substitute the proxy server with your appropriate proxy server. |
. |
You can now execute the 3_2_InsertRSSItems.sql script to insert RSS items into the newsItems table. Notice how the script uses the XMLTable() SQL/XML function to extract RSS items for insertion into the table newsItemstable. a. If the 3_2_InsertRSSItems.sql script is not displayed in the SQL Worksheet area, click the Files tab, and then double-click the 3_2_InsertRSSItems.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area.
b. Click the Run Script (F5) icon to run the script.
c. The Select Connection window is displayed. Select the rss connection, and then click OK.
The output is displayed in the Script Output tab.
|
. |
Expand Connections > rss > Tables, and then click the NEWSITEMSTABLE table. The table's information is displayed on the right payne.
Click the Partitions sub-tab and then click the Actions drop-down list, and then select Statistics > Gather Statistics ... . The Gather Statistics window is displayed.
Specify 100% statistics sampling by selecting 100 from the Sample Percent drop-down list, and then click Apply. A Confirmation window is displayed. Click OK.
Click the Refresh button to see the collected statistics for the partitions of NEWSITEMSTABLE. Notice how each partition has its corresponding name, number of rows, number of blocks, and the range partitioning key information.
|
When RSS items are stored in the binary XML storage format, creating an XMLIndex Unstructured Component on RSS items will provide much improved query performance. To allow full text searches on RSS item titles, a secondary text index can be created on the path table of an unstructured XMLIndex. Detailed information on this topic can be found in a best practices white paper for optimal XML query performance.
. |
As user rss , execute the 4_1_CreateUnstructXMLIndex.sql script to create an XMLIndex Unstructured Component index using a subset of XPath pointing to news item titles. A secondary Oracle Text Context index is also created. Notice how Oracle XML DB handles multiple namespaces used by different RSS formats. Also notice the LOCAL keyword specified for creating locally managed unstructured XMLIndex. a. Click the Files tab, and then double-click the 4_1_CreateUnstructXMLIndex.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area.
b. Click the Run Script (F5) icon to run the script.
c. The Select Connection window is displayed. Select the rss connection, and then click OK.
The output is displayed in the Script Output tab.
|
|---|---|
. |
Expand Connections > rss > Tables and then click the NEWSITEM_BIX_PATH_TABLE table to open the table details on the right pane. NEWSITEM_BIX_PATH_TABLE is the main storage table for the XMLIndex Unstructured Component.
The table information is displayed.
|
. |
Click the Partitions tab and the Refresh button to see how locally managed Unstructured XMLIndex index data have been stored in their corresponding local partitions.
|
The main function of a RSS (Real Simple Syndication) aggregator is to aggregate and normalize RSS items from multiple feeds using diverse RSS formats. In this section, you will see how you can create an RSS view normalizing diverse RSS formats.
. |
As user rss, execute the 5_1_CreateUnstructRSSViews.sql script to create two relational views newsItemsRelView_1 and newsItemsRelView_2, which normalize RSS items using diverse RSS formats. Notice the following:
a. Click the Files tab, and then double-click the 5_1_CreateUnstructRSSViews.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area.
b. Click the Run Script (F5) icon to run the script.
The output is displayed in the Script Output tab.
|
|---|---|
. |
You can check out the query execution plan for the query that generates a normalized RSS feed from newsItemsRelView_1. In the same script file 5_1_CreateUnstructRSSViews.sql, select the CASE 1 SQL/XML query, and then click the Autotrace… button to see the execution plan in the window pane below. Notice in the normalized query result in optimized query execution where the Unstructured XMLIndex and the secondary text index are used. Notice the TABLE ACCESS BY LOCAL INDEX ROWID and the TABLE ACCESS BY GLOBAL INDEX ROWID operations associated with the locally managed unstructured XMLIndex and the globally managed full-text index respectively.
|
. |
You can do the same for newsItemsRelView_2. In the same script file 5_1_CreateUnstructRSSViews.sql, select the CASE 2 SQL/XML query, and then click the Autotrace… button to see the execution plan in the window pane below. Notice PARTITION RANGE SINGLE operation that accessed only partition 2 of the NEWSITEMSTABLE.
|
. |
Open and execute the 5_4_CreateRSSResouce.sql script to create two Oracle XML DB repository resources representing two normalized RSS feeds with aggregated and normalized RSS items. a. Click the Files tab, and then double-click the 5_4_CreateRSSResouce.sql script from the location where you saved the contents of the downloaded files.zip file . The contents of the script file is displayed in the SQL Worksheet area.
b. Click the Run Script (F5) icon to run the script.
c. The Select Connection window is displayed. Select the rss connection, and then click OK.
The output is displayed in the Script Output tab.
|
There are several news readers capable of processing RSS 2.0 format. You can use either a Firefox 3.5 or a Windows Internet Explorer 8 (MS-Windows installations only) browser to read the normalized RSS feed.
. |
In order to read the RSS content in Mozilla Firefox, perform the following steps: a. From the Edit menu in Mozilla Firefox, select Preferences. The Firefox Properties window is displayed as follows:
b. Select the Applications icon at the top of the Firefox Properties window, select Preview in Firefox for the Web Feed content type, and then click Close.
|
|---|---|
. |
Start your Mozilla Firefox Web browser, and enter the following url:
The two XML documents that are contained in the RSS folder are displayed.
|
. |
Click one of the XML document links such as the first one. A list of the RSS items aggregated from the multiple RSS feeds is displayed. If prompted, enter rss/rss for the User Name and Password.
You can click on a label to display the source.
|
In this tutorial, you learned about the following:
Credits
![]()
|
Copyright © 2011, Oracle and/or its affiliates. All rights reserved |