Load, Save and Filter XML Documents Using the DOM Level 3 API

by Deepak Vohra
07/19/2006

Abstract

We all use XML for data exchange in enterprise applications. The DOM Level 3 Load and Save specification provides a standard mechanism for loading and saving (serializing) an XML document. As specified in the DOM Level 3 Load and Save specification, "This specification defines the Document Object Model Load and Save Level 3, a platform- and language-neutral interface that allows programs and scripts to dynamically load the content of an XML document into a DOM document and serialize a DOM document into an XML document." The Java API for XML Parsing (JAXP) DocumentBuilder class also provides a standard method to create a parser and load an XML document, but this is specific to the Java language. The DOM 3 Load and Save API may be implemented in any language. JAXP also provides the Transformer API to serialize an XML document. In addition to facilitating the loading and saving of an XML document, the DOM 3 Load and Save provides event handling and filtering of XML documents as the document is parsed or serialized. This article illustrates these new features, which should result in increased portability of DOM applications.

Overview

The DOM Level 3 Load and Save API provides a standard way to both fetch a parser and save an XML document. JAXP also provides a standard mechanism, but DOM Level 3 Load and Save includes additional features such as event handling and filtering.

The DOM 3 Load and Save specification has several advantages over the JAXP DocumentBuilder and Transformer:

  • The DOM Level 3 Load supports the registration of an event listener with the parser. When the loading of an XML document with the DOM 3 parser is complete, the generated load event indicates the document loading has completed.
  • Nodes may be filtered as they are loaded by a DOM 3 parser, or as they are serialized.
  • A selected node instead of the complete document may be saved in the DOM document.
  • A Document node or an Element node may be saved as a java.lang.String instead of as a file. The exchange of XML documents in a Web service may require an XML document as a String type.

This article explains the procedure to load and save an XML document with the DOM Level 3 specification. The DOM Level 3 Load and Save specification also allows filtering of content at load time and at serialization time, and this feature will also be demonstrated. This article uses the implementation provided by the Xerces2 Java Parser 2.7.0.

Preliminary Setup

The DOM Level 3 specification is implemented in several API distributions. In this tutorial, I'll use the Xerces2 distribution. The DOM Level 3 API implementation in JDK 5.0 may also be used with some modifications to the code samples in this tutorial. To run the example code provided in this article, the Xerces library is required in the classpath. Download the Xerces2-j Parser 2.7.1. Add the xercesImpl.jar and xml-apis.jar JARs to the classpath. Now I'm set to run the code.

Loading an XML Document

I'll start by looking at how to load an XML document. The interfaces/classes in the org.w3c.dom.ls package are used to load, save, and filter an XML document. The LSParser interface in this package is used to load an XML document, parse an XML document, and obtain a Document object. The document loaded by a LSParser may also be validated with an XML Schema. The following code is the standard way in which to retrieve a DOM implementation, which can then be used to parse an XML document.

As you'll see, most of the code is simply used to initialize registries and properties so as to extract the final parser.

To parse an XML document, first import the org.w3c.dom.ls package:

import org. w3c.dom.ls.*;

Next, set the DOMImplementationRegistry.PROPERTY system property:

System.setProperty(DOMImplementationRegistry.PROPERTY,

            "org.apache.xerces.dom.DOMImplementationSourceImpl");

A DOMImplementationRegistry is a factory that enables applications to obtain instances of a DOMImplementation. Create a DOMImplementationRegistry object:

DOMImplementationRegistry registry =

       DOMImplementationRegistry.newInstance();

Obtain a DOMImplementation instance from the DOMImplementationRegistry object:

DOMImplementation domImpl =

       registry.getDOMImplementation("LS 3.0");

Specifying "LS 3.0" in the features list ensures that the DOMImplementation object implements the load and save features of the DOM 3.0 specification. Next, cast the DOMImplementation object to DOMImplementationLS:

DOMImplementationLS implLS = (DOMImplementationLS)domImpl;

The DOMImplementationLS interface provides methods to create load and save objects. Create a LSParser instance from the DOMImplementationLS type object:

LSParser parser =

    implLS.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, 

                             "http://www.w3.org/2001/XMLSchema");

The mode of parsing may be set to MODE_SYNCHRONOUS or MODE_ASYNCHRONOUS. If the mode is MODE_SYNCHRONOUS, the parse and parseURI methods of the LSParser object return the org.w3c.dom.Document object. If the mode is MODE_ASYNCHRONOUS, the parse and parseURI methods return null. The schemaType, http://www.w3.org/2001/XMLSchema, specifies the type of schema used to load an XML document. Obtain a DOMConfiguration object from the LSParser; a DOMConfiguration represents the configuration parameters of a LSParser:

DOMConfiguration config=parser.getDomConfig();

To set the error-handler parameter of the DOMConfiguration, create a class that implements the DOMErrorHandler interface. Here is a simple example:

private class DOMErrorHandlerImpl implements DOMErrorHandler{

    public boolean handleError(DOMError error){

       System.out.println("Error Message:"+error.getMessage());

       if(error.getSeverity()==DOMError.SEVERITY_WARNING)

          return true;

       else

         return false;

    }

}

Set the error-handler parameter:

DOMErrorHandlerImpl errorHandler=new DOMErrorHandlerImpl();

config.setParameter("error-handler", errorHandler);

Set the validate, schema-type, validate-if-schema, and schema-location parameters to validate the XML document loaded with the LSParser with an XML Schema:

config.setParameter("validate" , Boolean.TRUE);

config.setParameter("schema-type" , 

                              "http://www.w3.org/2001/XMLSchema");

config.setParameter("validate-if-schema" , Boolean.TRUE);

config.setParameter("schema-location"  ,"catalog.xsd");

Finally, parse the XML document with the LSParser:

Document document = parser.parseURI("catalog.xml");

If the XML document schema validation has any errors, the error handler specified with the error-handler parameter will receive the errors. The XML document parsed, catalog.xml, the XML Schema, catalog.xsd for validating the XML document, and the DOM3Builder.java Java class used to load an XML document are available in the Download section of this article.

Having loaded the XML document, the XML document may be parsed and updated as you would in the DOM Level 2 API. Previous to the DOM Level 3 Load and Save specification, XML document loading varied with the parser used to load and parse an XML document, a disadvantage to the portability of XML document parsing applications. With the DOM Level 3 specification, the loading and saving mechanism is standardized.

Some of the limitations of the Xerces2-j implementation of the DOM Level 3 specification are: the Xerces 2j org.w3c.dom.ls package does not provide an implementation class for the LSParser interface which also implements the EventTarget interface, and it does not support the parseWithContext method of the LSParser interface.

Pages: 1, 2, 3

Next Page ยป