Load, Save and Filter XML Documents Using the DOM Level 3 API
by Deepak Vohra
07/19/2006
Abstract
We all use XML for data exchange in enterprise applications. The DOM Level 3 Load and Save specification provides a standard mechanism for loading and saving (serializing) an XML document. As specified in the DOM Level 3 Load and Save specification, "This specification defines the Document Object Model Load and Save Level 3, a platform- and language-neutral interface that allows programs and scripts to dynamically load the content of an XML document into a DOM document and serialize a DOM document into an XML document." The Java API for XML Parsing (JAXP)
DocumentBuilder class also provides a standard method to create a parser and load an XML document, but this is specific to the Java language. The DOM 3 Load and Save API may be implemented in any language. JAXP also provides the Transformer API to serialize an XML document. In addition to facilitating the loading and saving of an XML document, the DOM 3 Load and Save provides event handling and filtering of XML documents as the document is parsed or serialized. This article illustrates these new features, which should result in increased portability of DOM applications.
Overview
The DOM Level 3 Load and Save API provides a standard way to both fetch a parser and save an XML document. JAXP also provides a standard mechanism, but DOM Level 3 Load and Save includes additional features such as event handling and filtering.
The DOM 3 Load and Save specification has several advantages over the JAXP
DocumentBuilder and
Transformer:
- The DOM Level 3 Load supports the registration of an event listener with the parser. When the loading of an XML document with the DOM 3 parser is complete, the generated load event indicates the document loading has completed.
- Nodes may be filtered as they are loaded by a DOM 3 parser, or as they are serialized.
- A selected node instead of the complete document may be saved in the DOM document.
- A
Documentnode or anElementnode may be saved as ajava.lang.Stringinstead of as a file. The exchange of XML documents in a Web service may require an XML document as a String type.
This article explains the procedure to load and save an XML document with the DOM Level 3 specification. The DOM Level 3 Load and Save specification also allows filtering of content at load time and at serialization time, and this feature will also be demonstrated. This article uses the implementation provided by the Xerces2 Java Parser 2.7.0.
Preliminary Setup
The DOM Level 3 specification is implemented in several API distributions. In this tutorial, I'll use the
Xerces2 distribution. The DOM Level 3 API implementation in JDK 5.0 may also be used with some modifications to the code samples in this tutorial. To run the example code provided in this article, the Xerces library is required in the classpath. Download the Xerces2-j Parser 2.7.1. Add the
xercesImpl.jar and
xml-apis.jar JARs to the classpath. Now I'm set to run the code.
Loading an XML Document
I'll start by looking at how to load an XML document. The interfaces/classes in the
org.w3c.dom.ls package are used to load, save, and filter an XML document. The
LSParser interface in this package is used to load an XML document, parse an XML document, and obtain a Document object. The document loaded by a
LSParser may also be validated with an XML Schema. The following code is the standard way in which to retrieve a DOM implementation, which can then be used to parse an XML document.
As you'll see, most of the code is simply used to initialize registries and properties so as to extract the final parser.
To parse an XML document, first import the
org.w3c.dom.ls package:
import org. w3c.dom.ls.*;
Next, set the
DOMImplementationRegistry.PROPERTY system property:
System.setProperty(DOMImplementationRegistry.PROPERTY,
"org.apache.xerces.dom.DOMImplementationSourceImpl");
A
DOMImplementationRegistry is a factory that enables applications to obtain instances of a
DOMImplementation. Create a
DOMImplementationRegistry object:
DOMImplementationRegistry registry =
DOMImplementationRegistry.newInstance();
Obtain a
DOMImplementation instance from the
DOMImplementationRegistry object:
DOMImplementation domImpl =
registry.getDOMImplementation("LS 3.0");
Specifying "LS 3.0" in the features list ensures that the
DOMImplementation object implements the load and save features of the DOM 3.0 specification. Next, cast the
DOMImplementation object to
DOMImplementationLS:
DOMImplementationLS implLS = (DOMImplementationLS)domImpl;
The
DOMImplementationLS interface provides methods to create load and save objects. Create a
LSParser instance from the
DOMImplementationLS type object:
LSParser parser =
implLS.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS,
"http://www.w3.org/2001/XMLSchema");
The mode of parsing may be set to
MODE_SYNCHRONOUS or
MODE_ASYNCHRONOUS. If the mode is
MODE_SYNCHRONOUS, the
parse and
parseURI methods of the
LSParser object return the
org.w3c.dom.Document object. If the mode is
MODE_ASYNCHRONOUS, the
parse and
parseURI methods return null. The
schemaType,
http://www.w3.org/2001/XMLSchema, specifies the type of schema used to load an XML document. Obtain a
DOMConfiguration object from the
LSParser; a
DOMConfiguration represents the configuration parameters of a
LSParser:
DOMConfiguration config=parser.getDomConfig();
To set the
error-handler parameter of the
DOMConfiguration, create a class that implements the
DOMErrorHandler interface. Here is a simple example:
private class DOMErrorHandlerImpl implements DOMErrorHandler{
public boolean handleError(DOMError error){
System.out.println("Error Message:"+error.getMessage());
if(error.getSeverity()==DOMError.SEVERITY_WARNING)
return true;
else
return false;
}
}
Set the
error-handler parameter:
DOMErrorHandlerImpl errorHandler=new DOMErrorHandlerImpl();
config.setParameter("error-handler", errorHandler);
Set the
validate,
schema-type,
validate-if-schema, and
schema-location parameters to validate the XML document loaded with the
LSParser with an XML Schema:
config.setParameter("validate" , Boolean.TRUE);
config.setParameter("schema-type" ,
"http://www.w3.org/2001/XMLSchema");
config.setParameter("validate-if-schema" , Boolean.TRUE);
config.setParameter("schema-location" ,"catalog.xsd");
Finally, parse the XML document with the
LSParser:
Document document = parser.parseURI("catalog.xml");
If the XML document schema validation has any errors, the error handler specified with the
error-handler parameter will receive the errors. The XML document parsed,
catalog.xml, the XML Schema,
catalog.xsd for validating the XML document, and the
DOM3Builder.java Java class used to load an XML document are available in the Download section of this article.
Having loaded the XML document, the XML document may be parsed and updated as you would in the DOM Level 2 API. Previous to the DOM Level 3 Load and Save specification, XML document loading varied with the parser used to load and parse an XML document, a disadvantage to the portability of XML document parsing applications. With the DOM Level 3 specification, the loading and saving mechanism is standardized.
Some of the limitations of the Xerces2-j implementation of the DOM Level 3 specification are: the Xerces 2j
org.w3c.dom.ls package does not provide an implementation class for the
LSParser interface which also implements the
EventTarget interface, and it does not support the
parseWithContext method of the
LSParser interface.