As Published In

Oracle Magazine
May/June 2003
Developer XML

X Is for XQuery

By Jason Hunter

Query any XML—including relational databases—with XQuery.

Query may cause one of the biggest changes to server-side programming since Java servlets. XQuery is a query language specification under development by the World Wide Web Consortium (W3C) that's designed to query collections of XML data—not just XML files, but anything that can appear as XML, including relational databases.

XQuery provides the mechanism to efficiently and easily extract information from Native XML Databases (NXD) and relational data as well. With XQuery, you can view RDBMS tables as just another XML data source. XQuery makes possible the exciting possibility of a single query that combines an incoming purchase order in native XML format, an archive of catalog data also in native XML format, and an inventory system held in a relational database.

XQuery isn't limited to only the server, either. It's already showing up in products such as Apple's Sherlock as a mechanism for querying XML data feeds. Sherlock actually combines XQuery with JavaScript to produce an easy way to script and query XML content.

Note that the XQuery specification is under active development. The most recent draft was published in November 2002, which came on the heels of August and April 2002 drafts. This article discusses the latest draft, November 2002, and provides examples supported by that draft. Note that different vendors support different XQuery specification drafts, so be aware of your vendor's XQuery support when using these November 2002 draft examples.

Xpath

XQuery makes heavy use of XPath, an expression language used to select portions of an XML document. In fact, XQuery 1.0 and XPath 2.0 are under development by the same W3C working group, and their specifications are intertwined. XPath expressions behave in some ways like regular expressions, except they operate on XML nodes instead of characters. Both XPath and regular expressions can look somewhat cryptic, with lots of slashes and brackets, but both are incredibly powerful and charmingly elegant. XPath is easiest to understand through syntax examples, as in the following:

  • /html/body/h1. Selects all <h1> elements that are children of a <body> element that is the child of an <html> element that is the root element in a document. The result may be multiple <h1> elements.
  • //h1. Selects all <h1> elements that appear anywhere within a document. The double-slash indicates arbitrary depth.
  • count(//book). Returns the number of <book> elements that appear within a document.
  • //book[author = "Hunter"]. Returns all <book> elements that have an <author> child element whose string value is "Hunter." The square brackets surround a "predicate" that acts as a filter on the match results.
  • //book[@year > 1999]. Returns all <book> elements that have an attribute "year with a value greater than 1999." The @ sign marks year as an attribute. Notice how an attribute value can be treated as an integer for comparison within XPath.
  • //book[@pages]. Returns all <book> elements that have a pages attribute with any value.
  • //book/@pages. Returns all the pages attributes that are attached to <book> elements.
  • (i | b). Returns all <i> or <b> child elements from the current context node. XPath expressions work like file system lookups; if there's no leading slash, the path is relative to the "current context node" (specified outside the expression).
  • (//servlet | //servlet-mapping) [servlet-name = $servlet]. Returns all <servlet> or <servlet-mapping> elements that have a <servlet-name> child element whose value equals the $servlet variable.
  • //key[. = "Total Time"]. Returns all <key> elements that have a value of "Total Time." The "." in the expression represents the context node, which is similar to a "this" pointer in object-oriented languages.
  • (//key)[1]/text(). Returns the text nodes of the first <key> element within the document.

Historically, XPath has been used as part of Extensible Style Language Transformations (XSLT). Document Object Model (DOM) Level 3 and JDOM also support XPath as a way to select portions of a document without manual searching. In XQuery, XPath expressions are a type of simple query, although they are more commonly part of larger queries. Suppose you have the following XML file:

<greeting>
 <hello xml:lang="en">Hello</hello>

 <hello xml:lang="es">Hola</hello>
 <hello xml:lang="de">Hallo</hello>
</greeting>

With the information in this file, you can retrieve a particular language's textual greeting using the following XPath expression (assume that $lang is prebound to a two-character language code):

/greeting/hello[@xml:lang = $lang]/text()

While XPath alone is useful for simple data extraction, XQuery builds on XPath with FLWOR expressions.

FLWOR Expressions

FLWOR (pronounced "flower") expressions are the building blocks of XQuery. The name comes from the For, Let, Where, Order by, and Return keywords that make up the expression. The for clause provides a mechanism for iteration, and the let clause allows variable assignments. The for and let clauses specify a sequence of tuples (a tuple is an ordered set of values).

These tuples can then be filtered with a where clause and ordered using an order by clause. The return clause at the end of an expression indicates what should be returned. The return clause is evaluated once for every tuple surviving the where clause and ordered according to the order by clause. The return value of the FLWOR expression is the ordered sequence of content generated by the return clause as it evaluates each tuple.

FLWOR expressions allow the building of arbitrarily complex XML results and provide a mechanism to utilize multiple documents in a single query. For example, the expression in Listing 1 (borrowed from the "XML Query Use Cases" document at www.w3.org/ XML/Query) returns the prices of books found at both bn.com and amazon.com. Notice how the expression contains a mix of XML content with a FLWOR expression. In XQuery, curly braces separate XML content from enclosed expressions.

This query in Listing 1 behaves quite a lot like a SQL inner join; the difference is that it queries XML documents and produces a direct XML result. When analyzing the query, note that the for clause creates a set of tuples containing all the Barnes and Noble <book> elements and all the Amazon <entry> elements. Notice how the XPath expressions //book and //entry come into play, as well as $b/title and $a/price/text() later. The for clause has two independent variables, and so, as with SQL, it generates a Cartesian product containing all possible pairings of books. The where clause filters results only to those pairs where the <title> elements match. The return clause creates a <book-with-prices> element for each tuple and prints the title and prices within. We can enhance the query to sort the results by price by using an order by clause. Now let's put the less expensive books first, as shown in Listing 2.

The order by clause uses an XPath expression to select the price elements and uses a built-in function min() to calculate the minimum value. Note the XQuery comment surrounded by {-- and --} tags.

Conditionals

Like most languages, XQuery provides the conditional expression if/then/else. Unlike most languages, the else clause is mandatory. That's because every expression in XQuery has to return a value. There's no support for simple statements. For example, you learn to return () from the else clause to indicate the empty sequence. In a procedural language, you just wouldn't have an else clause.

To experiment with an if/then/else clause, I have a query to retrieve all books and their authors, but after the first two authors for a book, I want to return any additional authors as <et-al/>:

for $b in document("books.xml")/bib/book
return
 if (count($b/author) <= 2) then $b
 else <book> { $b/@*, $b/title, $b/author[position() <= 2], <et-al/>,

           $b/publisher, $b/price } </book>

This query reads book data from a "books.xml" URI, probably a local file. For each <book>, if the author count is <= 2, then I return the <book> directly. Otherwise I construct a new <book> element containing all the original data, except that I include only the first two authors, and after that, I append an <et-al/> element. I use a special function, position(), in the predicate here to return only the first two authors. I also use the cryptic $b/@* XPath expression, which refers to all the attributes on $b. Placing the attributes here at the head of the content sequence for <book> attaches all attributes to the new <book> element.

Functions and Operators

XQuery includes a vast set of functions and operators. There are functions for math, string and regular expression manipulation, date and time comparisons, XML node and QName manipulation, sequence manipulation, type conversion, Boolean logic, and input functions. You can also define your own, as I'll show later, and many engines provide custom extensions as well.

Table 1 shows some of the more commonly used built-in functions with descriptions for those that aren't intuitively obvious. Functions are often written in the "fn" namespace, such as fn:min(). All XQuery functions are in a namespace as a means of avoiding naming collisions. The "fn" namespace is a special namespace mapped to "http://www.w3. org/2002/11/xquery-functions". Unless you change it, the default function namespace is also mapped to the same URI, so using "fn" isn't generally necessary.

One word of warning: The function and operator names have been known to change dramatically between specification versions.

User-Defined Functions

If you can't find the built-in XQuery function you need, you can always write your own. The following example assumes an online movie review site that publishes its movie and review data as XML so that others can utilize the information in separate applications. The data files look like this:

<!-- movies.xml -->
<movies>

  <movie id="1">
    <title>The Matrix</title>
  </movie>
  <movie id="2">
    <title>The Matrix: Reloaded</title>
  </movie>
</movies>

<!-- movie-reviews.xml -->
<reviews>
  <review id="100">
    <movie-id>1</movie-id>
    <stars>5</stars>
   <comment>You can't call yourself a geek
    unless you've seen this
    <b>amazing</b> movie.</comment>
  </review>
  <review id="101">

    <movie-id>1</movie-id>
    <stars>3</stars>
    <comment><a href=
    "http://www.keanunet.com">Keanu</a>
    can act!</comment>
  </review>
</reviews>

I can define a function to determine the average star count for a movie id:

define function star-count($movie-id) {
  let $review-doc := document(
   "movie-reviews.xml")
  return avg($review-doc/reviews/
    review[movie-id = $movie-id]/stars)

I can then use this function to iterate over each <movie> and return a new <movie> element that includes an average review. If there are no reviews, I return "N/A".

let $movie-doc := document("movies.xml")
for $movie in $movie-doc/movies/movie
let $stars := star-count($movie/@id)
return
 <movie id="{$movie/@id}">
  {$movie/title}
  <stars>{
    if ($stars) then $stars else "N/A"
  }</stars>
 </movie>

Functions can be recursive and mutually recursive (where one function refers to another that refers to the first). However, there's no support yet for overloading based on parameter count or type, nor is there support for variable-length argument lists. It's true that you see built-in functions using both of these features, but user functions don't yet support it.

X Is for Xquery: Part 2
E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy