XML

Five XSLT 2.0 Features that Simplify XML Document Transformations
by Jinyu Wang

Learn how to overcome the limitations of XSLT 1.0 with these 2.0 features

XSLT is a powerful language that has been widely used to transform XML documents. However, XSLT in its current 1.0 version has limitations that can make writing stylesheets difficult and complex. In this Technical Article, I will describe five new features in XSLT 2.0 that will help overcome these limitations:

  • Grouping: allows simplified and efficient content grouping
  • Multiple Outputs: creates multiple output documents in one XSLT transformation
  • Temporary Tree: eliminates node-set conversions
  • Character Mapping: replaces the error-prone character escaping
  • Datatype Binding: allows processing data according to their datatypes.

These new XSLT 2.0 features not only enrich the functionality but also optimize the performance of XML transformations.

Setting up the XSLT Command-line Utility

You can run XSLT transformations using the oracle.xml.parser.v2.oraxsl command-line utility provided in the Oracle XDK 10g, which can be downloaded from the Oracle OTN XML Center.

You can download the sample code of this paper here.

To run the command-line utility, you need to include the xmlparserv2.jar library in the $XDK_HOME/lib directory in the Java CLASSPATH, where $XDK_HOME refers to the home directory of your extracted downloaded XDK. After setting up the Java CLASSPATH, you can run the command-line utility as follows:

>java oracle.xml.parser.v2.oraxsl <XML document> <XSL document>

In the following sections, this utility will be used to transform XML documents.

Exploring New XSLT 2.0 Features

All the XSLT examples in this paper will transform an XML document for a CD catalog, as shown here:

<CATALOG>
  <CD>
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>10.90</PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tylor</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  <CD>
    <TITLE>Still got the blues</TITLE>
    <ARTIST>Gary More</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Virgin records</COMPANY>
    <PRICE>10.20</PRICE>
    <YEAR>1990</YEAR>
  </CD>
  <CD>
    <TITLE>This is US</TITLE>
    <ARTIST>Gary Lee</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Virgin records</COMPANY>
    <PRICE>12.20</PRICE>
    <YEAR>1990</YEAR>
  </CD>
</CATALOG>

This is a very common format when working with database data. You will get this output from the following database table using one of the various Oracle features such as the XML SQL Utility or SQL-XML:

CREATE TABLE CD_TBL(
TITLE VARCHAR2(100),
ARTIST VARCHAR2(500),
COUNTRY VARCHAR(5),
COMPANY VARCHAR2(50),
PRICE FLOAT,
YEAR DATE);

The following XSLT examples will focus on the XSLT 2.0 features that can simplify the transformations of these kinds of XML documents. However, it should be noted that the techniques discussed could be applied to other types of XML documents.

Feature 1: Grouping

Grouping data is a common operation when transforming XML documents in XSLT, especially when you manage the tabular data from database tables and want to publish the documents on Web.

For example, let's say when publishing the CD information online, you would like to transform the CD catalog document into a set of CDs grouped by their countries. What if, after grouping the CDs by their countries, you then want to further group the CDs by year:

<COUNTRY name="UK">
   <YEAR year="1988">
      <TITLE>Hide your heart</TITLE>
   </YEAR>
   <YEAR year="1990">
      <TITLE>Still got the blues</TITLE>
      <TITLE>This is US</TITLE>
   </YEAR>
</COUNTRY>
<COUNTRY name="USA">
   <YEAR year="1985">
      <TITLE>Empire Burlesque</TITLE>
   </YEAR>
</COUNTRY> 

Because XSLT 1.0 does not include built-in grouping support, a smart but complex and memory-intensive method, called the Muenchian Method (after Steve Muench of Oracle), is widely used as an alternative to solve the problem, as shown below

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:key name="cd-by-country" match="CD" use="COUNTRY" />
<xsl:key name="cd-by-year" match="CD" use="YEAR" />
<xsl:template match="CD">
<xsl:for-each 
select="current()[generate-id()=generate-id(key('cd-by-country',COUNTRY)[1])]">
 <xsl:sort select="COUNTRY" />
   <COUNTRY name="{COUNTRY}">
   <xsl:for-each select="key('cd-by-country', COUNTRY)">
   <xsl:for-each 
     select="current()[generate-id()=generate-id(key('cd-by-year', YEAR)[1])]">
    <xsl:sort select="YEAR" />
     <YEAR name="{YEAR}">
  	 <xsl:for-each select="key('cd-by-year', YEAR)">
	   <xsl:copy-of select="current()/TITLE"/>
        </xsl:for-each>
       </YEAR>
     </xsl:for-each>
    </xsl:for-each>
   </COUNTRY>
  </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

The method is very complex and expensive due to the nesting of generated index. However, with XSLT 2.0, you needn't worry about this problem. In XSLT 2.0, you can use the new <xsl:for-each-group> element to easily group XML elements:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
  <xsl:template match="/technology/">
    <xsl:for-each-group select="/technology/CATALOG//CD" group-by="COUNTRY">
      <xsl:sort select="COUNTRY"/>
      <COUNTRY name="{COUNTRY}">
        <xsl:for-each-group select="current-group()" group-by="YEAR">
          <YEAR year="{YEAR}">
            <xsl:copy-of select="current-group()/TITLE"/>
          </YEAR>
        </xsl:for-each-group>
      </COUNTRY>
    </xsl:for-each-group>
  </xsl:template>
</xsl:stylesheet>

In this example, first you need to specify XPATH expressions in the select attribute of <xsl:for-each-group> to select the elements to be grouped. Then, you need to provide one of the four attributes for the <xsl:for-each-group> element to determine how to group the selected items:

  • group-by: groups selected items by evaluating its expression and ignoring the order in which the items appear in the selected sequence
  • group-adjacent: groups together only adjacent items with the same value
  • group-starting-with: holds a pattern that matches the first node in each group
  • group-ending-with: holds a pattern that matches the last node in each group.

In this example, all the selected <CD> elements are grouped by the content within its child <COUNTRY> element regardless of the selected order.

Within the <xsl:for-each-group> element, you can include a <xsl:sort> element to sort the groups. In this example, <xsl:sort select="COUNTRY"> ensures that the order of the CD groups depends on the alphabetical order of their country names. That's why the CD group from UK is listed before that from USA.

In XSLT 2.0, you can refer to the current group node-set using the current-group() function. In the example, it is used to group the CDs in each country by their published year as follows:

<xsl:for-each-group select="current-group()" group-by="YEAR">

The new XSL stylesheet is much easier to write and understand.

Feature 2: Multiple Outputs

The second XSLT 2.0 feature that greatly simplifies the writing of XSL stylesheets is its support for creating multiple XSLT outputs. This feature allows you to define multiple XSLT transformation outputs in one XSL stylesheet.

In many real-world applications, you need to create multiple outputs from one XML document. For example, when generating Java docs, you may want to create multiple HTML pages to use frames. You may also need to create supplementary files that are referenced by the main output document. For example, when creating an HTML report, you might need to create several Scalable Vector Graphics (SVG) graphics, Cascading Style Sheets (CSS) stylesheets, or metadata files for the report.

Since XSLT 1.0 does not have the capability to define multiple outputs in one XSL stylesheet, you have to apply separate XSLT transformations for each one. Although you can write one XSL stylesheet that relies on XSLT parameters to indicate which page to create, the XSL stylesheet needs to execute multiple times and can be quite complicated because of the need to manage these parameters.

To simplify these kinds of operations, XSLT 1.0 processors provide various extensions to solve the problem. For example, Oracle XDK provides an <ora:output> extension function. However, these extensions in XSL stylesheets prevent them from being portable across processors.

In XSLT 2.0, a new <xsl:result-document> element can be used to define multiple outputs. For example, when transforming a CD catalog, the <CD/> element published in each country can be created as a separate result document, as shown here.

<xsl:stylesheet version="2.0"      
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" name="cd-format"/>
<xsl:template match="/technology/">
 <xsl:for-each-group select="/technology/CATALOG//CD" group-by="COUNTRY">
  <xsl:result-document 
        href="../output/CD{position()}_{current-group()/COUNTRY}.xml"
        format="cd-format">
   <CD_LIST country="{current-group()/COUNTRY}">
    <xsl:copy-of select="current-group()"/>
   </CD_LIST>
  </xsl:result-document>
 </xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>

Generally, two steps are required when defining multiple outputs:

  • Setting up the output formats
  • Specifying the output file names.

Because each output might have a different format, you need to define named output formats using <xsl:output> at the top of the XSL stylesheet, and then refer to the formats using their names. Each <xsl:result-document> element then generates the output format named by its format attribute. In the example, a cd-format output format is defined as an XML output format with element output indentations:

<xsl:output method="xml" indent="yes" name="cd-format"/>

It is used later by <xsl:result-document> to create one XML document for the CDs from each country.

In addition to specifying the output format, you can specify expressions defining the output file names. In the example, the output files are named using the order of the current group by calling the position() function, and the country name of the current group by selecting the current country name using the current_group()/COUNTRY expression as follows:

  <xsl:result-document 
        href="../output/CD{position()}_{current-group()/COUNTRY}.xml"
        format="cd-format">

Because the XSLT stylesheet is processed only one time, the process is much more efficient.

Feature 3: Temporary Trees

Temporary trees are another new construct introduced in XSLT 2.0. Instead of representing the intermediate XSL transformation results and XSL variables as strings, as in XSLT 1.0, the intermediate results and XSL variables, constructed by <xsl:variable>, <xsl:param>, or <xsl:with-param> elements, are stored as a set of document nodes, called temporary trees.

With temporary trees, you can evaluate the content of a variable or a parameter using the XPath expressions, and modularize the XSL processing. This approach offers a lot of flexibility when applying templates or extracting data from XSL variable or template parameters. For example, as shown below, the catalog variable is set to select all the CDs published after 1988.

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output indent="yes"/> 
 <xsl:variable name="catalog" select="/technology//CD[number(YEAR)>=1988]"/>
  
 <xsl:template match="/technology/">
   <Expensive>
     <xsl:apply-templates select="$catalog[number(PRICE)>10]"/>
   </Expensive>
   <Cheap>
     <xsl:apply-templates select="$catalog[number(PRICE) < 10]"/>
   </Cheap>
 </xsl:template>
  
  <xsl:template match="*">
    <xsl:copy-of select="."/>
  </xsl:template>
</xsl:stylesheet>

The selected nodes then can be further categorized as either expensive CDs, which have a price of $10 and higher, or cheap CDs, which have a price lower than $10 dollars. Without the temporary tree feature, <xsl:apply-templates select="$catalog[number(PRICE) < 10]"/> is invalid, and there is no way to access the data you previously selected when creating the $catalog variable.

With this new feature, you can easily break up complex transformations into several modules and apply iterative processing on the XML documents.

Feature 4: Character Mapping

The character mapping feature in XSLT 2.0 is useful when you want to generate files with reserved or invalid XML characters in the XSLT outputs, such as the <, > and & characters. This is very useful when you want to generate the documents containing markup, such as XML schema, XSL stylesheet or JSP files.

In XSLT 1.0, you had to use the disable-output-escaping attribute of the <xsl:text> and <xsl:value-of> elements to specify the character escaping. But this requirement can be complex and error prone.

In XSLT 2.0, this problem is solved by allowing you to declare mapping characters with an <xsl:character-map> element as a top level stylesheet element. In the <xsl:character-map> element, you need to specify the mapping between the output characters and their represenations in the XSL stylesheets. If, for example, you need character representations from the Unicode private area, which are Unicode numbers between #xE000 and #xF8FF, then you need to associate the character mapping definitions to each <xsl:output> using the use-character-maps attribute. For example, here is the XSL stylesheet that defines a character map for generating a JSP file:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output use-character-maps="jsp"/>

  <xsl:character-map name="jsp">
    <xsl:output-character character="" string="<%"/>
    <xsl:output-character character="" string="%>"/>
  </xsl:character-map>
  <xsl:template match="/technology/">
  @ page language="java" 
    <HTML>
      <BODY> myvariable String; </BODY>
    </HTML>
  </xsl:template>
</xsl:stylesheet>

In this example, the character map is set to map #xE001 to <% and #xE002 to %>. When generating the XSLT output, all the #xE001 characters are replaced by <%. Similarly, any occurrences of the #xE002 character is replaced by %>:

<?xml version = '1.0' encoding = 'UTF-8'?>
  <%@ page language="java" %>
    <HTML><BODY><% myvariable String; %></BODY></HTML>

Character mapping in XSLT 2.0 provides a more robust character serialization than the disable-output-escaping support in XSLT 1.0 because the un-escaped characters are guaranteed to persist in the serialization output even when a text node or attribute is copied in a temporary tree. Furthermore, XSL processors are more likely to produce consistent results with the character maps.

Feature 5: Datatype Binding

Another simplifying feature is datatype binding, which is especially useful when you need to work with different types of data, such as dates, durations, numbers and other XML schema datatypes. (This feature is not currently supported in XDK 10g Production, but will be supported in a future release.)

In XST 1.0, data operations are limited to string, number, and Boolean processing. In XSLT 2.0, your stylesheet can use the 44 built-in XML Schema datatypes and construct functions associated with those datatypes. You can also specify datatype for all the variables or parameters defined by <xsl:variable>, <xsl:param>, and <xsl:with-param> using the AS attribute. The AS attribute tells the XSLT processor to check the datatypes of the variables or parameters, and convert them to the specified datatypes. This early runtime check prevents further processing of bad data and the unexpected results that may be difficult to debug.

To use these data types, you must include the xmlns:xs="http://www.w3.org/2001/XMLSchema" URL along with the other namespace declarations in the start-tag of <xsl:stylesheet>. For example, here's an XML document listing the ordering status for the CDs:

<?xml version='1.0' encoding='windows-1252'?>
<Order>
<Item name="Empire Burlesque" number="18"/>
<Item name="Hide your heart" number="7"/>
<Item name="Still got the blues" number="10"/>
<Item name="This is US" number="5"/>
</Order>

Here the XSL stylesheet reads in the CD Catalog data and calculates the total order price by multiplying each CD price by the corresponding order quantity:

<?xml version='1.0' encoding='windows-1252'?>
<xsl:stylesheet version="2.0" 
   xmlns:xs="http://www.w3.org/2001/XMLSchema"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
   <!-- Root template -->
   <xsl:variable name="CDs">
    <xsl:apply-templates select="/technology/" mode="phase1"/>
   </xsl:variable>

   <xsl:template match="/technology/">
    <xsl:apply-templates select="$CDs" mode="phase1"/>
    <xsl:value-of select="sum($CDs//item)"/>
   </xsl:template>
   
    <xsl:template match="/technology/" mode="phase1">
    <xsl:apply-templates mode="phase1"/>
   </xsl:template>
  
   <xsl:template match="CD" mode="phase1">
     <xsl:variable name="title" select="TITLE"/>
     <xsl:variable name="num" select="document('CDOrder.xml')/Order/Item[@name=$title]/@number" as="xs:integer"/>  
     <xsl:variable name="price" select="PRICE/text() "/>  
     <xsl:element name="item">
      <xsl:value-of select="$price*$num"/>
     </xsl:element>
   </xsl:template>
</xsl:stylesheet>

When reading in the order quantity from CDOrder.xml to the $num variable, as="xs:integer" is used to convert the data into an xs:integer datatype. This approach ensures the data is a valid number. For example, if CDOrder.xml is updated as shown below to order 10.5 "Still got the blues" CDs, a datatype conversion error will be thrown immediately at the time the variable is created.

<?xml version='1.0' encoding='windows-1252'?>
<Order>
<Item name="Empire Burlesque" number="18"/>
<Item name="Hide your heart" number="7"/>
<Item name="Still got the blues" number="10.5"/>
<Item name="This is US" number="5"/>
</Order>

This is just a small example of how datatype binding greatly enriches data operations in XSL stylesheets.

Summary

If you write an XSL stylesheet, the features discussed in this paper will not only save you time, but also improve the performance of your XSLT transformations.


Jinyu Wang ( Jinyu.Wang@oracle.com) is a senior product manager for Oracle XML Product management and an Oracle Certified Professional. She is a co-author of Oracle Database 10g XML & SQL (Oracle Press).

Please rate this document:

Excellent Good Average Below Average Poor


Send us your comments

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy