Working with the XML Parser API — JSR 172

By Vikram Goyal

This article cover the basics of the XML Parsing API and uses SAX (Simple API for XML), an event-driven parsing method, as an example.

Published November 2011

Downloads:

Download: Java ME

Download: Sample Code (Zip)

Introduction

JSR 172 has been around for a while. In fact, it is now included as a backwards-compatible subset in JSR 280, which is an all-encompassing XML processing API for Java Platform, Micro Edition (Java ME) devices. When initially created, this API was not called the XML Parser API, but rather, the J2ME Web Services API. It had two primary goals: access to remote XML/SOAP-based Web services and parsing XML data.

In this article, I cover the basics of the XML Parsing API with a concrete example. I specifically cover the SAX (Simple API for XML) parsing method because the other method, DOM (Document Object Model), is out of favour due to its heavy memory footprint (besides being prohibited from use by the original specification).

Note: The NetBeans source code for the example provided in this article can be downloaded here.

Simple API for XML

In 2011, you would be hard pressed to find anyone in the development community who hasn’t worked with XML. It is used everywhere and is the backbone of most digital communication. However, not too long ago, it was an emerging platform, and the need for mobile devices to utilize it as the standard for data communication was felt to be of high importance. However, needing the data was one thing; being able to convert that data into useful models within the development environment was another.

Two methods for parsing this data so that it could be converted into programming models were in vogue at that time (and still are): SAX (Simple API for XML) and DOM (Document Object Model).

These methods differ in the way they process XML data. While DOM reads the whole document and then creates a model in memory, SAX processes XML data in a sequence, generating events along the way for a handler to work on.

In addition, SAX is event-based. It doesn’t allow insertion or deletion of nodes within the XML document, and it does not take too many resources to work. DOM, on the other hand, is tree-based, allows modifications to the nodes, and is a memory hog. This might not seem to be too much of an issue in modern devices, but not very long ago, this was a huge problem for memory-constrained devices. Therefore, the XML Parser API disallowed the use of DOM in XML processing and mandated the use of only SAX.

Working with SAX

Because SAX is an event-driven parser, it provides various listed events that trigger the callback of a particular method. Some of the methods associated with these events are the following:

  • startDocument and endDocument: Triggered when document traversing is started and finished
  • startElement and endElement: Triggered when the parser encounters the start and end of an XML element
  • characters: Triggered when the processing of the data within an element is finished

The XML Parser API provides a DefaultHandler class. If you, as a developer, want to create a parser for a specific XML document, you need to extend this class and provide actual code for the listed methods (only the methods that you think might be required). These methods must then create a model based on the data that is supplied while parsing the document. These methods must also validate the data and raise any errors accordingly.

The XML Parser API provides a standard factory for the creation of a SAX parser. The handler that you create for the processing of your own XML document must be provided to this parser instance along with the XML document that is being parsed, for example:

 parser = 
   SAXParserFactory.newInstance().
   newSAXParser();
   parser.parse(is, saxMenuHandler);
 

In a nutshell, for the parsing of a custom XML document, you need to create a handler that extends the DefaultHandler class provided by the API. This custom handler is responsible for listening to the events from the parser and creating the model based on those events (and the supplied data). Your handler is responsible for validating the document and its data. Let’s see how we can put this theory into practice.

A Working Example

For our working example, I create a handler to parse the following example XML document:

 <?xml version="1.0" encoding="UTF-8"?>
<menu restaurantName="My Modern Restaurant" phone="555-555-5555">
     <entree veg="N">
       <name>Freshly Shucked Oysters</name>
       <price>5.00</price>    
     </entree>
     <entree veg="N">
       <name>Duck Consomme with mushrooms</name>
       <price>7.00</price>    
     </entree>
     
     <main veg="Y">
       <name>Eggplant Mozzarella</name>
       <price>25.00</price>
       <description>Served with our delicious sauce</description>    
     </main>
     <main veg="N">
       <name>King Salmon</name>  
       <price>32.50</price>
       <description>Served with potato salad and quail egg</description>
     </main>      
   </menu>

Nothing fancy; just a menu for a fancy restaurant.

Creating the Model Objects

Before we get into the parser code, let’s create the model objects that will represent this data. We need three models: Menu, Entrée, and Main. Here’s model/Menu.java:

 model/Menu.java
   package model;
   import java.util.Vector;
public class Menu {
     private Vector entrees;
     private Vector mains;
     private String restaurantName;
     private String phone;
     
     public Menu() {
       this.entrees = new Vector();
       this.mains = new Vector();
     }
     
     public Vector getEntrees() { return this.entrees; }
     public void setEntrees(Vector entrees) { this.entrees = entrees; }
     
     public Vector getMains() { return this.mains; }
     public void setMains(Vector mains) { this.mains = mains; }
     
     public String getRestaurantName() { return this.restaurantName; }
     public void setRestaurantName(String restaurantName) {
       this.restaurantName = restaurantName;
     }
     
     public String getPhone() { return this.phone; }
     public void setPhone(String phone) { this.phone = phone; }
     
     public void addEntree(Entree entree) {
       this.entrees.addElement(entree);
     }
     
     public void addMain(Main main) {
       this.mains.addElement(main);
     }
     
 }

Here’s model/Entree.java:

 model/Entree.java
   package model;
public class Entree {
     
     private boolean vegetarian;
     private String name;
     private double price;
     
     public boolean isVegetarian() { return this.vegetarian; }
     public void setVegetarian(boolean vegetarian) {
       this.vegetarian = vegetarian; 
     }
     
     public String getName() { return this.name; }
     public void setName(String name) { this.name = name; }
     
     public double getPrice() { return this.price; }
     public void setPrice(double price) { this.price = price; }
     
   }
   And, finally, here’s model/Main.java:
 package model;
public class Main extends Entree {
     private String description;  
     public String getDescription() { return this.description; }
     public void setDescription(String description) {
       this.description = description; 
     }
}
   

As you can see, because Entree and Main share so many characteristics, the Main class simply extends the Entree class.

Creating the Handler

With these three model classes in place, we can now reliably create a valid model from the XML document--as long as we have a handler that can convert the XML to our Java model objects!

To start creating the handler, we have to extend the DefaultHandler class and provide implementation for at least three of its methods:

  • startElement(String uri, String localName, String qName, Attributes attributes)
  • endElement(String uri, String localName, String qName)
  • characters(char[] ch, int start, int length)

These methods are placeholders in the DefaultHandler class and by overriding them, you register them as the callback events for when SAXParser sends event notifications while parsing the XML document. The shell of this class is shown in the following code listing:

 public class MenuHandler extends DefaultHandler {
     public void startElement(
       String uri, String localName, String qName, Attributes attributes) 
       throws SAXException { 
     }
     public void endElement(
       String uri, String localName, String qName) throws SAXException { 
     }
   public void characters(char[] ch, int start, int length) throws SAXException {    
     }  
   }
   The startElement method is called at the start of an element. Thus, the handler needs to be aware of which element is being “started” and what attributes it has, if any. For example, look at the following code listing, which is a snippet from the finished code, or download the full source code:
       // start with the menu root element
       if(qName.equals("menu")) {
         if (menu == null) { 
           String restaurantName = attributes.getValue("restaurantName");
           String phone = attributes.getValue("phone");       
           if(restaurantName == null || phone == null) {
             throw new IllegalArgumentException(
               "A menu must have both restuarantName and phone");
           }
           // if we are here, the menu element is well formed - let's create it
           this.menu = new Menu(); 
           this.menu.setRestaurantName(restaurantName);
           this.menu.setPhone(phone);        
           
         } else {
           throw new IllegalStateException("Cannot have duplicate menu items");
         } 
 

The qName parameter provides the name of the element that is being traversed. You use it to figure out the mapping between the element and the relevant model. In the code above, I mapped the starting of the menu root element with the Menu model class. The attributes list provides the names and values of the XML attributes attached to the corresponding element, in this case, the Menu root element. So, I have looked up the value of the restaurantName and phone attributes and added them to the Menu model.

The code above also does some basic error checking. So, for example, if the startElement method encounters another Menu root element, it throws the IllegalStateException. Similarly, if it can’t find valid values for both the restaurantName and phone attributes, it throws the IllegalArgumentException. We can create the Entrée model objects similarly. The code snippet below shows the corresponding code:

     } else if(qName.equals("entree")) {      
         // process the entree element - this must be inside an existing
         // menu element
         if(menu == null) { 
           throw new IllegalStateException("Missing root Menu Element"); 
         } else if(currentEntree != null) {
           throw new IllegalStateException("Already processing an Entree!");        
         } else {
           
           // create the entree and set the vegetarian attribute
           // the rest of the values of the entree will be filled by the 
           // processing of the other elements
           currentEntree = new Entree();
           String vegetarian = attributes.getValue("veg");
          if(vegetarian == null) {
            // assume default value of N
           vegetarian = “N”;
          }
           currentEntree.setVegetarian(vegetarian.equals("Y"));
       }

As noted in the code comments, only the attributes of the Entrée model are set. The rest of the model object is created by the processing of the other elements and the other methods, namely the endElement and the characters methods. The corresponding code snippets are shown below.

 // from endElement method
   if(qName.equals("entree")) {
   // the entree object is complete
   // add to the menu
   menu.addEntree(currentEntree);
   // we don't need this object now, 
   // set to null
   currentEntree = null; 
   // from characters method
     public void characters(char[] ch, int start, int length)
     throws SAXException {    
       characterValue = 
       new String(ch, start, length).trim();        
   }

The characters method holds the value it finds within the start and end of an element in a class variable (characterValue), and this value is used by the methods to set the corresponding values of the model objects. Not shown in the above code is the fact that the characters method also takes care of handling any CDATA sections by appending the multiple calls each CDATA section makes to this method.

To get a complete picture of this code, see the complete listing for the MenuHandler.java class in the source code.

Wrapping Up—Writing the Parsing MIDlet

Since parsing of an XML document might be a time-consuming process, we should do it in a thread of its own. The XMLParserMidlet does this when you tell it to load the document:

     // on user request load the xml file in a separate thread
       if(c == loadCommand) {
         if(runner == null) { runner = new Thread(this); runner.start(); }
         return;
       }

There are three steps in the actual parsing:

  • Create an InputStream to the XML document that needs to be parsed.
  • Load the handler that will do the magic transformation of the XML document to corresponding model objects.
  • Instantiate a parser and parse away!

The actual run method within the MIDlet looks like this:

 public void run() {
       
       // stream to read the file in
       InputStream is = null;
       
       try {
         
         // let's read the XML file and parse it using the SAXParser
         is = getClass().getResourceAsStream("/menu.xml");
      // load the handler
         MenuHandler saxMenuHandler = new MenuHandler();
         
         // instantiate a SAXParser object
         SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
         
         // and parse away
         parser.parse(is, saxMenuHandler);  
         
         // if we are here, there have been no errors, so let us display the 
         // results on the screen
         Menu menu = saxMenuHandler.getMenu();
         
         // first details about the restaurant
         form.append(menu.getRestaurantName());
       form.append(menu.getPhone());
      // then the entrees
         form.append("Entrees\r\n");
         int entreeSize = menu.getEntrees().size();
         for(int i = 0; i < entreeSize; i++) {
           Entree entree = (Entree)menu.getEntrees().elementAt(i);
           form.append(
             entree.getName() + 
             (entree.isVegetarian() ? " (V)" : "") + " - " + entree.getPrice());
           form.append("\r\n");
         }
      // finally, the mains
         form.append("Mains\r\n");
         int mainSize = menu.getMains().size();
         for(int i = 0; i < mainSize; i++) {
           Main main = (Main)menu.getMains().elementAt(i);
           form.append(
             main.getName() + 
             (main.isVegetarian() ? " (V)" : "") + " - " + main.getPrice());
           form.append("\r\n");
           form.append(main.getDescription());
           form.append("\r\n");
         }      
    } catch(Exception e) {
         handleError(e);
       } finally { try { if(is != null) is.close(); } catch(Exception ex) {} }
           
   }

Notice the critical lines in the code where we do the parsing:

 SAXParser parser = SAXParserFactory.newInstance().newSAXParser();      
   // and parse away
   parser.parse(is, saxMenuHandler);  
   Menu menu = saxMenuHandler.getMenu();

As noted earlier, the XML Parser API provides the factory class for instantiating the SAXParser. I just got an instance of it and passed my XML file (via the InputStream) and my custom handler to it. I implemented a getMenu() method in the handler class to return the Menu model, which is used to hold the result of the parsing in the MIDlet.

The rest of the code is standard MIDlet code for displaying the result. If there are no errors, you will get a screen that looks like the following:

phone

Figure 1: Final Screen

Summary

This article covered the basics of using JSR 172, the XML Parser API. The XML Parser API defines the use of the SAX parser for parsing XML documents in resource-constrained devices. In this article, with the help of a working example, I covered what the SAX parser is, how it is defined by this API, and how best to use it.

See Also

  • JSR 172, the XML Parser API (Web Services API)
  • JSR 280, the XML API for Java ME

About the Author

Vikram Goyal is the author of Pro Java ME MMAPI: Mobile Media API for Java Micro Edition, published by Apress. This book explains how to add multimedia capabilities to Java technology-enabled phones. Vikram is also the author of the Jakarta Commons Online Bookshelf, and he helps manage a free craft projects Web site.