GXMLBeans 2.0 - A Java Developer's Perspective

Generics with XMLBeans

At a glance, JDK 5.0 generics enable the creation of parameterized classes and methods. The Collections API is one of the first places you will see generics used within XMLBeans. In an XML Schema, when an element contains the attribute maxOccurs with a value greater than 1, by default XMLBeans will create a Java array for these types. To enable generics, an additional parameter needs to be added to scomp, and a JDK 5.0-compatible VM needs to be used.

By default, the API for getting the item elements from the channel contains the following methods:

`RssDocument.Rss.Channel.Item`	`getItemArray(int i)` Gets "item" element
`RssDocument.Rss.Channel.Item[]`	`getItemArray()` Gets array of all "item" elements
`void`	`setItemArray(int i, RssDocument.Rss.Channel.Item item)` Sets "item" element
`void`	`setItemArray(RssDocument.Rss.Channel.Item[] itemArray)` Sets array of all "item" elements

However, the API changes after performing the compilation step with generics enabled:

Copy


/home/user>scomp
Compiles a schema into XML Bean classes and metadata.
Usage: scomp [opts] [dirs]* [schema.xsd]* [service.wsdl]* 
             [config.xsdconfig]*
Options include:
...
                        
-javasource [version] - generate java source compatible for a 
                               Java version (1.4 or 1.5)
...
        
#This is all it takes to enable Generics in your use of XMLBeans
/home/user>scomp -javasource 1.5 schema0.xsd

Using the example above, the new methods available look like:

`java.util.List<RssDocument.Rss.Channel.Item>`	`getItemList()` Gets a List of "item" elements

Let's look at how the use of generics makes the code for implementing the method to get all of the items reported by a single user simpler:

Copy


 public List<Item> getItemsFromReporter(String reporter) {
 
 // We already loaded the data as above
  // ...
  RssDocument.Rss.Channel channel = rss.getChannel();
 
  // We will use this object to get most of our data
  List<RssDocument.Rss.Channel.Item> items = 
                                           channel.getItemList();
 
  for (int i = 0; i < items.size(); i++) {
    RssDocument.Rss.Channel.Item item =  items.get(i);
     
    //Remove results from list
    if (item.getReporter().getUsername().compareTo(reporter) 
                                                           != 0)
        items.remove(i);
    }
  }
   
  return items;
 }

This is great, but there's an even easier way to get the item information for each user—well, easier once you understand XPath and/or XQuery.

XQuery and XPath

XQuery and XPath integration with XMLBeans was reworked in version 2.0. In version 1, Jaxen (an XPath implementation) was used, but the integration with XMLBeans didn't provide support for namespaces and prefixes. The latest release builds on the XQuery implementation provided by Saxon version 8.1.1. Since XQuery builds on top of XPath, Saxon also provides the XPath implementation for XMLBeans. To use the features of XQuery and XPath, the XmlObject class that all XMLBeans types are derived from provides two methods for performing queries and statements on an instance. The execQuery() and selectPath() methods of the XmlObject API return an array of the matching components. These methods also exist on the XmlCursor object, but the return object is another XmlCursor object populated with a list of matching values:

Copy


String xq = "for $e in //employee 
                where $e/name='Bob' return $e ";
 
 // Input is a valid xml instance
 XmlObject o = XmlObject.Factory.parse(input);
 
 XmlObject[] xObjres = o.execQuery(xq);
 XmlCursor xCurres = o.newCursor.selectPath(xq);

In the above code snippet, you see that the APIs are relatively simple to use, and you can handle the resulting data in the manner you find easiest. On line 4, we build up our query statement, and on lines 6 and 7 we run the query using the different APIs. XQuery is a powerful tool, and in the following code, note how simple getting the item data becomes:

Copy


 public XmlObject[] getItemsFromReporter(String reporter) {
   
   //Load Jira RSS feed data
   URL jiraFeedUrl; = new URL("<JiraFeedURL>");
  
  //This is the only object we need
  RssDocument rssDoc = RssDocument.Factory.parse(jiraFeedUrl);
  
  //Build the statement for the xpath engine
  String xpathStatement = 
                     "//item[reporter/@username='"+reporter+"']";
  
  //Execute the statement on the instance
  //We could cast this to an Item[] if we wanted
  XmlObject[] queryResult = rssDoc.selectPath(xpathStatement);
  
  return queryResult;
 }

Using XQuery alongside XMLBeans is powerful and makes working with XML much simpler. Numerous resources are available if you would like to learn more about XQuery. We recommend starting with the XMLBeans sample available on the Apache XMLBeans Web site.

At this point, implementing a solution to the problem of tracking quality metrics for XMLBeans has been made much easier thanks to the latest features of XMLBeans. We used the inst2xsd utility to create a schema for an instance, saving time by not having to write one from scratch. We saw how enabling generics can increase our productivity by making our business logic easier to code. Finally, we saw how using the newer XQuery integration provides us with a feature-rich way of manipulating and querying XML.

These were only the basics of the new features in the latest release of XMLBeans. Several other features make XMLBeans the right tool for all your development needs when you work with XML. This next feature will help to increase your productivity by giving you much more detailed information about the errors you may receive when working with XML and XML Schema.

Error Codes

Error codes are the next great feature provided in the 2.0 release. Methods have been created to integrate this new feature with tools like scomp as well as allow for programmatic access for use within, say, an IDE. Appendix C of the XML Schema specification defines a set of error codes defining improper schema articles. Error codes are made available programmatically during parsing, validation, and compilation—using the error listener. Previously, the detail and schema conformance of error messages was minimal at best. Additionally, detailed information about where the error is located and its correlation to the schema specification have been added. The error codes themselves are defined in the form "cvc-complex-type.2.2" and can be interpreted as //www.w3c.org/TR/xmlschema-1/#cvc-complex-type clause 2.2. Let's take a look at how this works now. We'll start with an XML Schema and validate an instance against it. Then we'll look at the old errors and compare them to the latest ones we receive.

Copy


 <!-- errorcode.xsd -->
  <xs:schema
   xmlns:xs="//www.w3.org/2001/XMLSchema"
   targetNamespace="//xmlbeans.rocks.com/"
   xmlns:tns="//xmlbeans.rocks.com/" >
   <xs:element name="address" type="tns:address"/>
   <xs:complexType name="address">
     <xs:sequence>
       <xs:element name="number" type="xs:unsignedInt"/>
       <xs:element name="street" type="xs:string"/>
       <xs:choice>
         <xs:sequence>
           <xs:element name="city" type="xs:string"/>
           <xs:element name="state" type="xs:string"/>
         </xs:sequence>
         <xs:element name="zipcode" type="xs:int"/>
       </xs:choice>
       <xs:element name="country" type="xs:string"/>
    </xs:sequence>
   </xs:complexType>
 </xs:schema>

This is a pretty simple schema. Note the use of the xs:choice model group as this is what the following instance will not define properly. You'll see the issue as soon as we start looking at some of the error codes:

Copy


  <!-- errorcode.xml -->
 <t:address 
   xmlns:t="//xmlbeans.rocks.com/" >
   <number>72</number>
  <street>156th NE</street>
   <country>USA</country>
 </t:address>

Along with the scomp utility available from the command line, a utility exists to validate an instance against a schema:

Copy


/home/user>validate
Validates the specified instance against the specified schema.
Contrast with the svalidate tool, which validates using a stream.
Usage: validate [-dl] [-nopvr] [-noupa] [-license] 
                schema.xsd instance.xml
Options:
 -dl - permit network downloads for imports and
            includes (default is off)
 -noupa - do not enforce the unique particle attribution rule
 -nopvr - do not enforce the particle valid (restriction) rule
 -partial - allow partial schema type system
 -license - print license information

If we ran the validate utility using the 1.0 branch of XMLBeans, the result would look like this:

Copy


/home/user>validate errorcode.xsd errorcode.xml
        errorcode.xml:0: error: Expected elements 
        city zipcode at the end of the content in element
        address@//xmlbeans.rocks.com/

The error text above notes the name of the instance, and tells us we are missing some expected elements at the end of address. In this small example, this is somewhat helpful, but without a line number it is hard to find a starting point. Let's compare this error text to the new error code feature in the latest release:

Copy


 /home/user>validate errorcode.xsd errorcode.xml
 
 errorcode.xml:4: error: cvc-complex-type.2.4a: Expected elements
 'city state' instead of 'country' here in element
 address@//xmlbeans.beaworld.com/ 
 
 errorcode.xml:4: error: cvc-complex-type.2.4c: Expected elements 
 'zipcode' before the end of the content in element
 address@//xmlbeans.beaworld.com/

In comparison to the error text we received in the 1.0 release of XMLBeans, we can see that the latest error text is a major improvement. Both error texts note the instance, but the latest also mentions the line number, the severity of the issue, the appendix C schema reference, and a more distinct error message. Additionally, with the new error codes, we see that there are more issues for the failing conditions as noted by the error codes cvc-complex-type.2.4a and cvc-complex-type.2.4c. Once again, these error codes correspond to a URL-accessible location in the schema specification.

We just looked at how we could get detailed error text from the command line, now let's peek at how we can get the error information programmatically:

Copy


 // Create the error listener and XmlOptions 
 LinkedList list = new LinkedList();
 XmlOptions opts = new XmlOptions().setErrorListener(list);
 
 // Load the instance
 File instance = new File("<SOME_PATH>\errorcodes.xml");
 AddressDocument ad = AddressDocument.Factory.parse(instance); 
 
 // If there are errors, making a method call like this will 
 // populate the error listener
 ad.validate(opts); 
 
 // Since we know there are errors, let's
 // look at how to get at the data 
 for(int i=0; i < errors.size(); i++) {
 
   // Cast list object to an XmlError 
     // type XmlError e = (XmlError)
   errors.get(i);
   
   // Now, let's get at all the information about the error
   // This will be the location of the error in the instance
   System.out.println("["+e.getLine()+","+e.getColumn()+"]-" + 
                         e.getSeverity())
   // Information about the error
   System.out.println(e.getErrorCode() + ": " +e.getMessage());
}

Looking at the code snippet, you can see that programmatically accessing the error information is not much more difficult than getting similar information from the command line.

Performance Improvements

There's a good chance you may never notice the previous feature, but the impact to your development effort will surely be noticed. What good is a new release if there aren't performance improvements along the way?

As in XMLBeans 1.0, performance was paramount to the 2.0 release. In most cases, performance updates in the 2.0 release created a 10- to 60-percent increase over the numbers seen in the 1.0 release. There were several reasons for this, most notably a complete rewrite of the store architecture. In 1.0, a data structure called a "splay tree " was used to keep everything in the XML Store in sync with operations that affected the XML data. For those unfamiliar with this, a splay tree is a balanced tree that supports operation Find, Insert, and Delete in O(log N) time. This kind of data structure is different from other such trees in that it doesn't maintain an explicit balance condition. That's probably more detail than you need to know for most purposes. In the 2.0 release, a simpler architecture was used that provides for less copying and fewer objects.

When performing any operations on XML data, XMLBeans loads an XML Store. XMLBeans always loads an XML Store and then provides a binding view on to the Store. When compared to other Java/XML binding frameworks that unmarshal directly to Java objects, this binding view and full XML Infoset fidelity typically is where the additional overhead comes from. This makes the performance bar all the higher for XMLBeans so that the tradeoff for additional functionality and information is minimal. The primary focus for runtime performance is in the area of the XML Store, and every effort is made to make the Store as performant as possible.

While the XML Store was being rewritten, a new feature was added that made programming with XMLBeans more performant and easier to use. This feature was DOM Level II support. DOM is short for Document Object Model, and it represents an interface for working with XML data. Level II just specifies which interfaces are available. It differs from SAX in that the XML information is kept in memory.

Native DOM II Support

In the 1.0 release, access to the DOM was handled by Xerces, so such calls returned a Xerces DOM Node. In 2.0, similar calls return an XMLBeans DOM representation as DOM II is now implemented natively. This means that inside XMLBeans you can access a DOM representation and an XMLBeans representation without having to coordinate two different data stores.

This also means you can work with XML in any one of three ways. The first way is to use JavaBean-like methods in the XmlObject API. The second is to use the token-based model via the XMLCursor API. And the third way is to use the tree model familiar to all those who know the DOM APIs. What is particularly nice about this is that you can move back and forth between these methods and not have to worry about the instance being out of sync. From a developer perspective, this means you or your other team members can work with XML in the way they work best. Let's take a brief look at some of the APIs you can move between for these underlying views of the XML:

Copy


//To get the live DOM Node:
Node  XmlObject.getDomNode()
Node  XmlCursor.getDomNode()

//To get back:
XmlObject XmlBeans.nodeToObject(Node n)
XmlCursor XmlBeans.nodeToCursor(Node n)

//XMLBeans 1.0 API returns a copy:
Node XmlObject.newDomNode()

As you can see from the code above, moving between these views is tremendously easy.

Conclusion

This article provided a look at some of the new features available in XMLBeans 2.0. Along the way we learned that XMLBeans offers a robust, full-fidelity Java-to-XML binding framework. We also saw how using some of the new features of XMLBeans 2.0 made one of our projects much faster and easier. These new features increase our productivity as developers. The performance improvements also help our productivity, but mostly these improvements mean you will spend less time debugging and profiling bottlenecks in your applications.

The features we described are only some of the improvements in the latest release of XMLBeans. Please take a look at XMLBeans and see how you may be able to use it to increase your own development efforts.

Additional Reading

An XQuery and XPath example
Some useful Schema Design Guidelines

Jacob Danner is a Senior Software Engineer at BEA. He has been with BEA on the WebLogic Workshop team since its inception in 2001.

Raj Alagumalai is a Senior Developer Relations Engineer at BEA Systems in the WebLogic Workshop support division.