As Published In
Oracle Magazine
September/October 2005

TECHNOLOGY: XML


XQuery Flowers

By Jonathan Gennick

Use XQuery to retrieve and transform both XML and relational data.

Query is an XML query language developed and standardized by the World Wide Web Consortium (W3C). XQuery's purpose is to find, retrieve, and rearrange data viewed through the lens of XML. XQuery is the syntax, and XML expressions are what you write using that syntax. Oracle's implementation of XQuery makes its debut in Oracle Database 10g Release 2, enabling you to draw data from both XML documents and relational tables, and to then use that data in generating new XML documents and relational rowsets.

A Small Scenario

Suppose you're a database developer and you help maintain a database for the tourist industry. Figure 1 shows the data you have to work with:

 

figure 1
Figure 1: Data you have to work with

 

  • Chamber-of-commerce data in a traditional, relational table ( CHAMBER_OF_COMMERCE ).

  • Information about tourist attractions in XML documents that you access via the Oracle XML DB Repository. Each document represents one attraction.

  • Census data as documents in a table ( COUNTY_CENSUS ) in an XMLType column. Each row contains an XML document with population counts for all cities in a given county.

Your manager has just asked you to generate an XML document combining census data and information for tourists, organized by county. You need to 

  1. Generate a set of county nodes from the CHAMBER_OF_COMMERCE table

  2. Retrieve the attraction data for each county from XML documents in the Oracle XML DB Repository

  3. Summarize the population counts from XML documents in the COUNTY_CENSUS table

Thankfully, you've just upgraded to Oracle Database 10g Release 2, and XQuery is at your disposal. You can view all your data, even the chamber-of- commerce data, as XML documents. You can express your query using a syntax that lets you "think in XML." Manipulating and rearranging your input XML documents to generate an entirely new type of XML document is an almost trivial exercise. XQuery makes an otherwise challenging task quite easy indeed.

Understanding an XQuery

Counties have attractions, and counties have cities with populations. When joining together many-to-one data sources, I always find it helpful to begin at the top. Listing 1 shows a simple XML query against the CHAMBER_OF_COMMERCE table.

Code Listing 1: Demonstrating the ora:view function 

SELECT XMLQuery(
   'for $c in ora:view("CHAMBER_OF_COMMERCE")
    return $c' RETURNING CONTENT)
FROM dual;

<ROW>
   <COC_ID>1</COC_ID>
   <COC_NAME>Alger County Chamber of Commerce</COC_NAME>
   <COC_PHONE>(906) 387-2138</COC_PHONE>
   <COC_COUNTY>Alger</COC_COUNTY>
</ROW>


The XMLQuery function used in Listing 1 is part of Oracle's XQuery application-programming interface (API). It's a SQL/XML function that transmits an XQuery statement to the database and returns a single XML document as a result. The XML document will be an XMLType value. The for and return clauses in the first argument to XMLQuery represent XQuery syntax. The W3C XQuery standard requires that XQuery keywords be lowercase, something to be careful of if you're used to uppercasing your SQL keywords.

Listing 1's XQuery statement invokes the ora:view function built into Oracle's implementation of XQuery. The function transforms the rows from the CHAMBER_OF_COMMERCE table into a sequence of XML documents. Each document is composed of a single <ROW> element representing one row from the table. Elements corresponding to the column values for the row are nested under that <ROW> element.

The for clause in Listing 1 iterates over the incoming XML documents. During each iteration, the variable $c refers to the current document. No transformation of the XML takes place. The return clause executes for each iteration and simply returns each input document, unchanged, as an element in the output document. The <ROW> element shown in Listing 1 reflects the output from ora:view .

You won't be able to read much about XQuery without encountering the acronym FLWOR. This acronym is pronounced "flower," and it refers to the basic for-let-where-order by-return expression syntax that is at the core of working with XQuery. Listing 1 uses for and return.

Transforming Your XML

Listing 2 takes the evolving solution a bit further by returning a single XML <counties> document encompassing multiple <county> elements, one per county. The query follows the same FLWOR pattern as in Listing 1, but more of the FLWOR structure is used this time: 

  • The for loop is the same as before.

  • The let clause extracts three values using XPath notation.

  • The order by sorts the results by county name.

  • The return generates a <county> node for each county.

Code Listing 2: The transforming power of XQuery 

SELECT XMLQuery(
   '<counties>
    {for $c in ora:view("CHAMBER_OF_COMMERCE")
     let $coc_county := $c/ROW/COC_COUNTY/text(),
         $coc_name := $c/ROW/COC_NAME,
         $coc_phone := $c/ROW/COC_PHONE/text()
     order by $coc_county
     return
        <county>
           <name>{$coc_county}</name>
           <chamber phone="{$coc_phone}">{$coc_name/text()}</chamber>
        </county>}
    </counties>' RETURNING CONTENT)
FROM dual;

-
<counties>
   <county>
      <name>Alger</name>
      <chamber phone="(906) 387-2138">Alger County Chamber of Commerce</chamber>
   </county>
   <county>
   <name>Houghton</name>
   <chamber phone="(906) 337-4579">Keweenaw Convention & Visitors Bureau</chamber>
   </county>


Listing 2 begins to show the transformative power of XQuery syntax. An XQuery statement is actually a mixture of tags that you want in your output together with XQuery expressions that supply values for elements and attributes within those tags. Look carefully at the use of curly braces in Listing 2. Each set of { } denotes a nested XQuery expression, and you can nest expressions to any degree. The outermost expression in Listing 2 is <counties>... </counties> and represents the XML document you wish to generate. The FLWOR structure of Listing 2 is a nested expression that generates all the <county> elements within that document. Within each <county> element, you'll find more nested expressions— {$coc_county}, {$coc_phone} , and {$coc_name/text()} —that plug values from the input <ROW> element into the output <county> elements.

Listing 2 uses let to assign sequences of values from the input <ROW> elements to variables that you can conveniently reference in the return clause when designing your output document. XPath syntax is important here. XQuery supports XPath 2.0 syntax for specifying sequences of values to extract from an XML element. For example, Listing 2 uses the following syntax to extract the name of each county from the input documents generated by the call to ora:view

$c/ROW/COC_COUNTY/text()


Think of this syntax as a path navigating through a document. The path begins with $c , which represents an XML document returned from ora:view . County names are text values of the <COC_COUNTY> elements that in turn fall under <ROW> elements, hence the path $c/ROW/COC_COUNTY . Appending /text() causes the expression to return county names rather than <COUNTY> elements.

Strictly speaking, the result of the preceding XPath expression is an XML sequence representing the concatenation of all county names in a document returned by ora:view . However, there's only ever one county name in any one of those documents, so you'll always get a sequence of one value, a singleton sequence, which you can treat as just a single value.

Notice that Listing 2 does not apply the text() function to the $c/ROW/COC_NAME element in the let clause, but rather in the return clause. You have a lot of freedom in XQuery to extract elements and their values when and where you need them.

You don't need to use variables such as $coc_county in Listing 2, but you'll find it convenient to use variables and thus avoid having to reference long XPath expressions over and over within your return clause. In addition, your variable assignments give you a single place to change your XPath expressions, should that become necessary.

Your First Join

Now that you can successfully build an XML document with a list of counties, it's time to include each county's list of attractions. Listing 3 shows one possible solution to the problem. A nested FLWOR expression joins each occurrence of a <COUNTY> element in the output document to the attraction documents found in the Oracle XML DB Repository for that county.

Code Listing 3: Joining data from two documents 

SELECT XMLQuery(
   '<counties>
    {for $c in ora:view("CHAMBER_OF_COMMERCE")
     let $coc_county := $c/ROW/COC_COUNTY/text(),
         $coc_name := $c/ROW/COC_NAME,
         $coc_phone := $c/ROW/COC_PHONE/text()
     order by $coc_county
     return
        <county>
           <name>{$coc_county}</name>
           <chamber phone="{$coc_phone}">{$coc_name/text()}</chamber>
           <attractions>
              {for $a in collection("/public")
               where $coc_county = $a/attraction/county/text()
               return $a
              }
           </attractions>
        </county>}
    </counties>' RETURNING CONTENT)
FROM dual;

<counties>
  <county>
    <name>Alger</name>
    <chamber phone="(906) 387-2138">Alger County Chamber of Commerce</chamber>
    <attractions>
      <attraction url="http://www.fs.fed.us/r9/forests/hiawatha/recreation/hiking/
grand-island-trail/index.php">
        <name>Grand Island National Recreation Area</name>
        <phones>
          <phone>
             <name>Munising Ranger District</name>
             <number>(906) 387-2512</number>
          </phone>
          <phone>
             <name>Passenger Ferry Service</name>
             <number>(906) 387-3503</number>
          </phone>
        </phones>
        <county>Alger</county>
      </attraction>
      <attraction url="http://www.nps.gov/piro/">
        <name>Pictured Rocks National Lakeshore</name>
...
      </attraction>
    </attractions>
  </county>
...
</counties>


You can see that Listing 3 is very similar to Listing 2, but with two additions: 

  • An <attractions> node under <county>

  • A nested FLWOR expression within <attractions>

The nested FLWOR expression generates the list of attractions for each county and shows one use for the "W" component of FLWOR. The logic works as follows: 

  1. Oracle's implementation of XQuery's built-in collection() function returns a sequence of documents from the specified Oracle XML DB Repository directory (/public in this case)
  2. Each document in /public consists of a single <attraction> node

  3. The for clause iterates over each of those documents

  4. The where clause passes only those attractions for the current county

  5. The return clause simply places each entire <attraction> node under <attractions>

Be aware that the collection() function returns all documents in the directory that you specify. Those documents should all be of the same type, because there's no mechanism for specifying the type of document over which to iterate. If you've created the example data on your own system in order to follow along with this article, make sure that /public contains no other documents or place the example attraction documents in some other directory and modify the article listings accordingly.

The where clause in Listing 3 is the key to properly joining the XML documents generated from the CHAMBER_OF_COMMERCE table with those found in the /public directory (in the Oracle XML DB Repository). The for loop iterating over collection("/public") will actually bring back all attraction documents in the /public directory, and it will do that for each county from the CHAMBER_OF_COMMERCE table. You don't want to list the attractions under every county. Using where , you retain only those attractions that fall within a given county, discarding the others.

A Different Sort of Join

You're almost done! All that's left is to fold in the census data for each county. Remember from Figure 1 that 

  • Your census data is stored in an XMLType column.

  • You have one census document (one row) per county.

  • Each census document lists populations for all cities within the county.

There are two different ways to get at XML data held in an XMLType column. Which you choose depends on whether you want to generate one document as a result of your XQuery statement or whether you want to generate one document for each row in the XMLType column. I'll show both mechanisms; the first generates a single document with information from all counties. Listing 4 will do the job.

Code Listing 4: Aggregation and another way to join 

SELECT XMLQuery(
   '<counties>
    {for $coc in ora:view("CHAMBER_OF_COMMERCE"),
         $cc in ora:view("COUNTY_CENSUS")
     let $coc_county := $coc/ROW/COC_COUNTY/text(),
         $coc_name := $coc/ROW/COC_NAME,
         $coc_phone := $coc/ROW/COC_PHONE/text(),
         $cc_county := $cc/ROW/CC_COUNTY/text(),
         $cc_populations := $cc/ROW/CC_CITY_POPULATIONS/cities/city/population
     where $coc_county = $cc_county
     order by $coc_county
     return
        <county population ="{xs:unsignedInt(sum($cc_populations))}">
           <name>{$coc_county}</name>
           <chamber phone="{$coc_phone}">{$coc_name/text()}</chamber>
           <attractions>
              {for $a in collection("/public")
               where $coc_county = $a/attraction/county/text()
               return $a
              }
           </attractions>
        </county>}
    </counties>' RETURNING CONTENT)
FROM dual;

<counties>
...
  <county population="30923">
    <name>Marquette</name>
    <chamber phone="(906) 226-6591">Marquette Area Chamber of Commerce</chamber>
    <attractions>
      <attraction>
        <name>McCormick Wilderness</name>
        <phones>
          <phone>
            <name>Kenton Ranger District</name>
            <number>(906) 852-3500</number>
          </phone>
        </phones>
        <county>Marquette</county>
      </attraction>
      <attraction>
        <name>National Ski Hall of Fame</name>
        <phones>
          <phone>
            <name>Voice</name>
            <number>(906) 485-6323</number>
          </phone>
        </phones>
        <county>Marquette</county>
      </attraction>
    </attractions>
  </county>
</counties>


Listing 4 builds upon Listing 3 and comes with several new aspects that need explaining. I'll begin with the join, which this time is expressed using a comma-delimited syntax in the outer for loop that any SQL programmer will almost immediately understand: 

{for $coc in ora:view("CHAMBER_OF_COMMERCE"),
     $cc in ora:view("COUNTY_CENSUS")


The for loop now lists two sources of XML data, separated by commas. The result is all possible combinations of documents from the two sources. The where clause later in the FLWOR expression restricts the results to only those combinations for which both the CHAMBER_OF_COMMERCE and COUNTY_CENSUS nodes are for the same county.

Why is the Listing 4 join done differently from the Listing 3 join? The reason is that you want the population to be an attribute of <county>. To that end, the FLWOR expression that brings in the census data has to either precede or encompass the <county> tag. In Listing 4, I chose the former option.

Let's look at those population values. The document for Marquette County is as follows: 

Marquette
<?xml version="1.0"?>
<cities>
  <city>
    <name>Marquette</name>
    <population>19661</population>
  </city>
  <city>
    <name>Ishpeming</name>
    <population>6686</population>
  </city>
  <city>
    <name>Negaunee</name>
    <population>4576</population>
  </city>
</cities>


In Listing 4, you'll find a line in the let clause to extract the population from the current COUNTY_CENSUS document:


$cc_populations := $cc/ROW/CC_CITY_POPULATIONS/cities/city/population


This simple line illustrates something very important that you must keep in mind whenever using XPath to extract data from an XML document. XPath does not necessarily return single values. XPath expressions return sequences. In the case of Marquette County, there are three <population> nodes, so the sequence will consist of three values: 

19661, 6686, 4576


Don't worry about how these values are internally represented. That doesn't matter. What matters is that you know that $cc_populations now contains three distinct values, one for each city in the county. Listing 4 uses the sum() function within the <county> tag to generate one number that is the total population for each county. sum() is an aggregate function much like SQL's SUM . The difference is that XQuery's sum() aggregates the values in a sequence, whereas SQL's SUM aggregates values in a column.

After summing the population values for each city within a county, Listing 4 invokes the xs:unsignedInt constructor function to cast the resulting value as an integer. Otherwise you'd see the population represented as 3.0923E+004 rather than 30923.

Returning Rows from an XMLQuery

Earlier I said that there are two ways to get at data in an XMLType column. The approach in Listing 4 pulls all the XMLType column values into just one XML document. If you want to generate one document per county, you can use the approach shown in Listing 5. That approach uses the PASSING clause to pass two XML nodes into the XMLQuery function. The first is known as a context item : 

PASSING BY VALUE cc_city_populations,


Code Listing 5: Passing a context item 

SELECT rownum, XMLQuery(
   '<counties>
    {for $c in ora:view("CHAMBER_OF_COMMERCE")
     let $coc_county := $c/ROW/COC_COUNTY/text(),
         $coc_name := $c/ROW/COC_NAME,
         $coc_phone := $c/ROW/COC_PHONE/text()
     where $coc_county = $cc_county/county/text()
     order by $coc_county
     return
        <county population="{xs:unsignedInt(sum(/cities/city/population))}">
           <name>{$coc_county}</name>
           <chamber phone="{$coc_phone}">{$coc_name/text()}</chamber>
           <attractions>
              {for $a in collection("/public")
               where $coc_county = $a/attraction/county/text()
               return $a
              }
           </attractions>
        </county>}
    </counties>'
    PASSING BY VALUE cc_city_populations,
       XMLTYPE('<county>' || cc_county || '</county>') AS "cc_county"
    RETURNING CONTENT)
FROM county_census;


A context item is an XML node to which you can implicitly refer. For example, Listing 5 executes the following call to sum()

sum(/cities/city/population)


Notice the lack of a $ variable at the head of this XPath expression. None is needed. In the absence of a qualifier, the expression implicitly refers to the current instance of the context item, which in this case is an instance of the XML document stored in the CC_CITY_POPULATIONS column.

The second item listed in the PASSING clause is given an identifier: 

XMLTYPE('<county>' || cc_county || '</county>') AS "cc_county"

 

Next Steps


READ more about XQuery
www.xml.com/lpt/a/2002/10/16/xquery.html
www.w3.org/XML/Query

DOWNLOAD
sample data for this article
Oracle Database 10g Release 2

The simple expression here generates an XML fragment with the county name from the CC_COUNTY column. This is necessary, because the XQuery statement needs access to the county name in order to properly join to the census data, and you cannot simply pass in the value as text. Items you pass in via the PASSING clause must be XMLType values.

The approach in Listing 5 works because COUNTY_CENSUS contains exactly one row per county. Thus, to return one row for each CC_CITY_POPULATIONS value is to return one row per county. However, your XML might not always be so conveniently arranged. When confronted with a case in which you don't have a one-to-one correspondence between a table that you're querying and the XMLType values you wish to output, you can use the XMLTable function to generate one row of XMLType output from each iteration through a FLWOR expression's for loop. Listing 6 demonstrates.

XMLTable is designed to be used in the FROM clause of a SQL query. The invocation of XMLTable in Listing 6 closely mimics the invocation of XMLQuery in Listing 4. One difference is that you don't use RETURNING CONTENT with XMLTable . Another difference is the lack of <counties>...</counties> tags. Indeed, to generate one row for each iteration of the for loop, you must ensure that the for loop is the outermost structure in the query. In Listing 6 that's the case, so the query returns one <county>...</county> document for each combination of matching CHAMBER_OF_COMMERCE and COUNTY_CENSUS documents.

XQuery Is a Budding Opportunity

XQuery is an evolving standard and exists in "Last Call" draft form as this article goes to press. Oracle Database 10g Release 2 implements the XQuery 1.0 W3C Working Draft dated April 4, 2005. When the standard is finalized, Oracle will update its XQuery support as needed.

Oracle's new support for XQuery gives a huge boost to XML developers, who are now free to take an XML-centric approach to querying the database. Relational as well as XML data can be queried through the lens of XML, and results can be returned as either XML or a table. Queries can take full advantage of Oracle's high-performance database engine and optimizer.


Jonathan Gennick (Jonathan@Gennick.com) is an experienced Oracle professional and a member of the Oak Table Network who enjoys the challenge of writing a good query. He's written extensively on SQL and is very happy to have a new query language to explore. He wrote the SQL Pocket Guide and the Oracle SQL*Plus Pocket Reference, both from O'Reilly Media.


Send us your comments