XML Validation and XPath Evaluation in J2SE 5.0

   
By Robert Eckstein, September 8, 2005  

Some of the exciting new features of the Java 2 Platform, Standard Edition (J2SE) 5.0 release, code-named Tiger, are the added XML validation package at javax.xml.validation and the XPath libraries at javax.xml.xpath. Before the Tiger release, the Java API for XML Processing (JAXP) SAXParser or DocumentBuilder classes were the primary instruments of Java technology XML validation. The new Validation API, however, decouples the validation of an XML document from the parsing of the document. Among other things, this allows Java technology to support multiple schema languages. Let's take a closer look at XML validation first.

XML Validation

The simplest way to validate an XML document is to use a Validator object. This object will perform a validation against the Schema object from which the Validator was created. Schema objects are typically created from SchemaFactory objects. The static newInstance() object allows you to create a SchemaFactory using a preset XML schema. The following code demonstrates this:

        SchemaFactory factory =
            SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        Schema schema = factory.newSchema(new File("mySchema.xsd"));
        Validator validator = schema.newValidator();
 

Calling the validate() method on the Validator object performs the actual validation. This method takes at least a javax.xml.transform.Source object, of which you can use a SAXSource or a DOMSource, depending on your preference.

       
        DocumentBuilder parser =
            DocumentBuilderFactory.newInstance().newDocumentBuilder();
        Document document = parser.parse(new File("myXMLDocument.xml"));
        validator.validate(new DOMSource(document));
 

Here is a simple source example that shows how to validate an XML document using a World Wide Web Consortium (W3C) XML Schema, sometimes referred to as WXS.

                   
        try {

            // Parse an XML document into a DOM tree.
            DocumentBuilder parser =
                DocumentBuilderFactory.newInstance().newDocumentBuilder();
            Document document = parser.parse(new File("myXMLDocument.xml"));

            // Create a SchemaFactory capable of understanding WXS schemas.
            SchemaFactory factory =
                SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

            // Load a WXS schema, represented by a Schema instance.
            Source schemaFile = new StreamSource(new File("mySchema.xsd"));
            Schema schema = factory.newSchema(schemaFile);

            // Create a Validator object, which can be used to validate
            // an instance document.
            Validator validator = schema.newValidator();

            // Validate the DOM tree.
            validator.validate(new DOMSource(document));

        } catch (ParserConfigurationException e) {
            // exception handling
        } catch (SAXException e) {
            // exception handling - document not valid!
        } catch (IOException e) {
            // exception handling
        }       
 

Note that the newInstance() method takes in a constant to indicate which type of schema it can expect. Currently, the only schema that is required is the W3C XML Schema. This is an object-oriented schema language that provides a type system for constraining the character data of an XML document. WXS is maintained by the W3C and is a W3C Recommendation (that is, a ratified W3C standard specification).

Let's run this source code on the following XML file:

                   
<?xml version="1.0"?>
   
<birthdate>
    <month>January</month>
    <day>21</day>
    <year>1983</year>
</birthdate>
 

In addition, let's include the following W3C XML Schema document as our XML validation schema:

                   
<?xml version="1.0" encoding="UTF-8"?>
   
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
  <xs:import namespace="http://www.w3.org/XML/1998/namespace"
        schemaLocation="http://www.w3.org/2001/xml.xsd" />

  <xs:element name="birthdate">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="month" type="xs:string" />
        <xs:element name="day" type="xs:int" />
        <xs:element name="year" type="xs:int" />
      </xs:sequence>  
    </xs:complexType>
  </xs:element>
   
</xs:schema>
 

If the validation is successful, the program will run without incident. However, let's insert a spelling error on the month element:

                   
<amonth>January</amonth>
 

At this point, the Validator will throw a SAXException, the first few lines of which are shown here:

                   
ERROR:  'cvc-complex-type.2.4.a: Invalid content was
found starting with element 'amonth'. One of '{"":month}'
is expected.'
org.xml.sax.SAXParseException: cvc-complex-type.2.4.a:
Invalid content was found starting with element 'amonth'.
One of '{"":month}' is expected.
        At ...(Util.java:109)
        at ...(ErrorHandlerAdaptor.java:104)
        ...
 
Understanding XML Schema

All implementations of SchemaFactory are required to support the W3C XML Schema. If you're not familiar with W3C XML Schema, here's a quick summary.

XML schemas contain definitions that are either simple or complex types. At the highest level, a complex type contains other elements, while a simple type does not. (These types differ in other ways as well, but this article will not attempt to explain all the differences.) As an example, let's create a schema that defines a fullname element that must consist of a firstname, a middlename, and a lastname element, in that order.

<?xml version="1.0"?>
   
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="fullname">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="firstname" type="xs:string"/>
        <xs:element name="middlename" type="xs:string"/>
        <xs:element name="lastname" type="xs:string"/>
      </xs:sequence>
    </xs:complexType>
</xs:element>
   
</xs:schema>
 

At the root of every XML schema is, appropriately, a schema element. The schema declaration above includes an xmlns attribute that indicates that the elements and data types used in the schema come from the "http://www.w3.org/2001/XMLSchema" namespace.

Elements and Attributes With Simple Types

Elements and attributes with simple types do not declare other elements or attributes inside them. Instead, they declare only "text" of several different types. This can be one of the types included in the XML schema definition, or it can be a custom type that you can define yourself. You can also add restrictions to a data type in order to limit its content, and you can require the data to match a defined pattern. Here are some examples of simple elements:

<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="birthdate" type="xs:date"/>
 

These are some of the more common data types used with XML schema:

  • xs:string
  • xs:decimal
  • xs:integer
  • xs:boolean
  • xs:date
  • xs:time

Simple elements can also have a default value or a fixed value set. A default value is automatically assigned to the element when no other value is specified. A fixed value is also automatically assigned to the element and cannot be overridden. For example:

<xs:element name="firstname" type="xs:string" default="joe"/>
<xs:element name="firstname" type="xs:string" fixed="unknown"/>
 

Much as you would define an element, you can define attributes in XML schema using the name, type, default, and fixed modifiers. Attributes are optional by default, but you can employ the use attribute to require their presence.

<xs:attribute name="lang" type="xs:string" use="optional"/>
<xs:attribute name="lang" type="xs:string" use="required"/>
 

Elements With Complex Types

A complex element is an XML element that contains other elements and attributes. Look at this complex XML element, fullname, which contains only other elements, firstname, middlename, and lastname:

<fullname>
  <firstname>Robert</firstname>
  <middlename>Franklin</middlename>
  <lastname>Collins</lastname>
</fullname>
 

You can define this using XML schema in a couple of ways. First, the fullname element can be declared directly by naming the element, as shown below. Notice that the child elements -- firstname, middlename, and lastname -- are surrounded by the sequence indicator. This means that the child elements must appear in the same order as they are declared: firstname first, middlename second, and lastname third.

<xs:element name="fullname">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="firstname" type="xs:string"/>
        <xs:element name="middlename" type="xs:string"/>
        <xs:element name="lastname" type="xs:string"/>
      </xs:sequence>
    </xs:complexType>
</xs:element>
 

Second, we can have the fullname element use an attribute called type, which refers to the name of another complex type to use. Here, we've essentially made the complex type stand on its own, and we're referencing it from within the fullname element:

<xs:element name="fullname" type="personinfo"/>
   
<xs:complexType name="personinfo">
  <xs:sequence>
    <xs:element name="firstname" type="xs:string"/>
    <xs:element name="middlename" type="xs:string"/>
    <xs:element name="lastname" type="xs:string"/>
  </xs:sequence>
</xs:complexType>
 

The benefit here is that several elements can refer to the same complex type. You can also base a complex type element on an existing complex type and add some elements using an extension, like this:

<xs:element name="contact" type="fullpersoninfo"/>
   
<xs:complexType name="personinfo">
  <xs:sequence>
    <xs:element name="firstname" type="xs:string"/>
    <xs:element name="middlename" type="xs:string"/>
    <xs:element name="lastname" type="xs:string"/>
  </xs:sequence>
</xs:complexType>
   
<xs:complexType name="fullpersoninfo">
  <xs:complexContent>
    <xs:extension base="personinfo">
      <xs:sequence>
        <xs:element name="address" type="xs:string"/>
        <xs:element name="city" type="xs:string"/>
        <xs:element name="country" type="xs:string"/>
      </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>
 

W3C XML Schema has other useful features, as Table 1 indicates.

Table 1. Indicators
Indicator Function
all Specifies that each of the child elements must appear once but can appear in any order
choice Specifies that any one of the alternatives can occur
sequence Specifies that the child elements must appear in a specific order
@maxOccurs Specifies the maximum number of times an element can occur
@minOccurs Specifies the minimum number of times an element can occur
group Used to define related sets of elements
attributeGroup Used to define related sets of attributes
 

This is just a brief introduction to the features of XML schema, and it only touches on advanced features such as restrictions and extensions. For more information, see the home page for the W3C XML Schema.

Evaluating XPath Expressions in JDK 1.5

Another new package added to the XML arsenal of the JDK 1.5 release is java.xml.xpath. This package provides an API for evaluating expressions based on the XML Path Language (XPath) version 1.0. XPath allows you to select nodes from an XML document object model (DOM) tree. XPath also provides rules for converting a node to a boolean, double, or string value. The Javadocs offer more information: "XPath started in life in 1999 as a supplement to the XSLT and XPointer languages, but has more recently become popular as a stand-alone language, as a single XPath expression can be used to replace many lines of DOM API code."

Getting to Know XPath

Let's quickly look at XPath expressions and at how they are useful. The following is an example of a simple XPath expression:

book/author
 

This is known as a location path. This would select all author elements that are the children of a book element, where book is a child of the current context node. For example, if the current context node is the library element, then using the XPath expression book/author would select both author elements below:

<library>
  <book>
    <author name="Author A"/>
    <author name="Author B"/>
  </book>
</library>
 

The context node can be any node inside an XML DOM tree, including the root node.

Note that author must be a direct child of book. A special location path operator, //, selects nodes at any depth in an XML document below the context node. For example, the following selects all author elements below the context node:

//author
 

Table 2 lists some other useful XPath operators.

Table 2. Some XPath Operators
Location Path Description
../author Selects all author elements that are the children of the context node's parent
* Selects all child elements of the context node
*/author Selects all author element grandchildren of the current context node
/book/author Selects all author elements that are children of book elements, which are in turn children of the root node of the document
./book/author Selects all author elements that are children of book elements, which are in turn children of the current context node
 

In addition to elements, XPath location paths may also target attributes, text, comments, and processing-instruction nodes inside a DOM tree. Table 3 gives some usage examples.

Table 3. Usage Examples for XPath Location Paths
Location Path Description
author/@name Selects the attribute name of the author element.
author/node() Selects any type of node (text, comment, or processing instruction).
author/text() Selects the text nodes of the author element. No distinction is made between escaped and nonescaped character data.
author/comment() Selects all comment nodes contained in the author element.
author/processing-instruction() Selects all processing-instruction nodes contained in the author element.
 

Xpath predicates also allow for refining the nodes selected by an XPath location path. Predicates take the form [ expression] . The following example selects all foo elements that contain an include attribute with the value of true:

//foo[@include='true']
 

Predicates may be appended to each other to further refine an expression, for example:

//foo[@include='true'][@class='bar']
 

Using the XPath API

The following example demonstrates using the XPath API to select at least one node from an XML document:

     XPath xpath = XPathFactory.newInstance().newXPath(); 
     String expression = "/birthdate/year"; 
     InputSource inputSource = new InputSource("myXMLDocument.xml"); 
     NodeSet nodes = (NodeSet) xpath.evaluate(expression, inputSource, 
         XpathConstants.NODESET); 
 

Note that the XPath API allows the selected nodes to be converted to other data types, including Boolean, Number, and String objects. The return type is specified by a QName parameter in the method call used to evaluate the expression, which is either a call to XPathExpression.evaluate(), as shown above (the third parameter), to one of the XPath.evaluate() convenience methods. The allowed QName values are specified as constants in the XPathConstants class:

When a Boolean return type is requested, Boolean.TRUE is returned if one or more nodes was selected. Otherwise, Boolean.FALSE is returned. The String return type is a convenience for retrieving the character data from a text node, attribute node, comment node, or processing-instruction node. When used on an element node, the value of the descendant text nodes is returned. Finally, the Number return type attempts to coalesce the text of a node into a double data type.

For the XML document presented at the beginning of this article, you can use the following XPath API code to select the year element as a node, a string, and a number:

        try {
   
            // Parse the XML as a W3C document.
            DocumentBuilder builder =
                DocumentBuilderFactory.newInstance().newDocumentBuilder();
            Document document = builder.parse(new File("myXMLDocument.xml"));
   
            XPath xpath = XPathFactory.newInstance().newXPath();
            String expression = "/birthdate/year";

            // First, obtain the element as a node.

            Node birthdateNode = (Node)
                xpath.evaluate(expression, document, XPathConstants.NODE);
   
            System.out.println("Node is: " + birthdateNode);

            // Next, obtain the element as a String.

            String birthdateString = (String)
                xpath.evaluate(expression, document, XPathConstants.STRING);
   
            System.out.println("String is: " + birthdateString);

            // Finally, obtain the element as a Number (Double).

            Double birthdateDouble = (Double)
                xpath.evaluate(expression, document, XPathConstants.NUMBER);
   
            System.out.println("Double is: " + birthdateDouble);
   
   
        } catch (ParserConfigurationException e) {
            System.err.println("ParserConfigurationException caught...");
            e.printStackTrace();
        } catch (XPathExpressionException e) {
            System.err.println("XPathExpressionException caught...");
            e.printStackTrace();            
        } catch (SAXException e) {
            System.err.println("SAXException caught...");
            e.printStackTrace();
        } catch (IOException e) {
            System.err.println("IOException caught...");
            e.printStackTrace();
        }
 

When you run the example, you should see the following output:

Node is: [year: null]
String is: 1983
Double is: 1983.0
 
Source Code

You can download the source code for these examples here. A NetBeans IDE project file is included as part of the source code.

For More Information
Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.