Load, Save and Filter XML Documents Using the DOM Level 3 API
by Deepak Vohra
07/19/2006
The final method in the interface,
startElement(), specifies if an
Element node is to be accepted, rejected, or skipped. As the specification indicates, "The parser will call this method after each Element start tag has been scanned, but before the remainder of the Element is processed. The intent is to allow the element, including any children, to be efficiently skipped." This may be why three methods are available for filtering—to improve efficiency.
The return values of the
startElement() method are also listed in Table 1. Only an Element and the Element's attributes are input to the
startElement() method. The method may be used to modify the attributes of an element. The difference between the
acceptNode() method and the
startElement() method is:
- Only the
Elementnodes are input to thestartElement()method, as compared to theacceptNode()method in which all the nodes except theDocument,DocumentType,Notation,Entity,DocumentFragment, andAttributenodes may be input. TheAttributenodes may be input to theacceptNode()method of theLSSerializerFilterinterface. - The Element node input to
startElement()will include all the Element's attributes but none of the children nodes. The nodes input to theacceptNode()method of theLSParserFilterinclude all the children nodes but none of the attribute nodes. The nodes input to theacceptNode()method of theLSSerializerFilterinclude all the children nodes, and may include the attribute nodes.
In the following example
InputFilter class, I specify the return type of the
getWhatToShow() method as
NodeFilter.SHOW_ELEMENT. In other words, I only want to show
Element nodes to the filter. The return type of
acceptNode() and
startElement() methods is
LSParser.FILTER_ACCEPT:
private class InputFilter implements LSParserFilter {
public short acceptNode(Node node) {
return NodeFilter.FILTER_ACCEPT;
}
public int getWhatToShow() {
return NodeFilter.SHOW_ELEMENT;
}
public short startElement(Element element) {
System.out.println("Element Parsed " + element.getTagName());
return NodeFilter.FILTER_ACCEPT;
}
}
The example input filter inputs only the Element nodes to the filter's
acceptNode() method; the other nodes are included in the DOM document without filtering. In this example, the
acceptNode() method of the filter accepts all the nodes that are input. The
startElement() method prints out the
Element nodes as they are parsed in the XML document. To use a filter, create an instance of the
InputFilter class and set the filter on the
LSParser:
InputFilter inputFilter=new InputFilter();
parser.setFilter(inputFilter);
Parse and filter XML document:
Document document=parser.parse(input);
Now I'll show how to create an output filter. As an example, I'll filter a node from the
Document in the output filter. Create an
OutputFilter class that implements the
LSSerializerFilter interface. In addition to the return values listed in Table 2, the
getWhatToShow() method of the
LSSerializerFilter interface may also be
SHOW_ATTRIBUTE.
In the following example
OutputFilter class, I specify the return type of the
getWhatToShow() method as
NodeFilter.SHOW_ELEMENT and the return type of the
acceptNode() method as
FILTER_ACCEPT for all
journal nodes other than the
journal node with date attribute April 2005, which I reject:
private class OutputFilter implements LSSerializerFilter {
public short acceptNode(Node node) {
Element element = (Element) node;
if (element.getTagName().equals("journal")) {
if (element.getAttribute("date").equals("April 2005")) {
return NodeFilter.FILTER_REJECT;
}
}
return NodeFilter.FILTER_ACCEPT;
}
public int getWhatToShow() {
return NodeFilter.SHOW_ELEMENT;
}
}
Create an instance of the
LSSerializerFilter:
LSSerializer domWriter = impl.createLSSerializer();
Create an instance of the
OutputFilter and set the filter on the
LSSerializer:
OutputFilter outputFilter = new OutputFilter();
domWriter.setFilter(outputFilter);
Create a
LSOutput object and set the
OutputStream for the
LSOutput object:
LSOutput lsOutput = impl.createLSOutput();
OutputStream outputStream =
new FileOutputStream(new File("c:/output/filter.xml"));
lsOutput.setByteStream(outputStream);
Output the filtered XML document:
domWriter.write( document, lsOutput);
Run the filter application DOM3Filter.java. The input filter lists the elements as they are parsed:
Element Parsed journal
Element Parsed article
Element Parsed title
Element Parsed author
Element Parsed journal
Element Parsed article
Element Parsed title
Element Parsed author
The output from the output filter is listed in the following code listing:
<?xml version="1.0" encoding="UTF-8"?>
<catalog title="dev2dev">
<journal date="May 2005">
<article section="WebLogic Server">
<title>Session Management for Clustered Applications</title>
<author> Jon Purdy</author>
</article>
</journal>
</catalog>
As illustrated in the code listing, the
journal node with date="April 2005" has been removed.
DOM3Filter.java, the Java class used to filter an XML document, is available in the Additional Reading section at the end of this article.
Prior to the DOM Level 3 Load and Save specification, an XML document could not be filtered as the document was parsed or output. In the DOM Level 2 API, nodes are removed with the remove methods of the
Node interface.
Download
Download the source code of the examples found in this article: resources.zip
Conclusion
With the DOM3 Load and Save API, an XML document may be loaded, saved, and filtered. In this tutorial, the DOM Level 3 specification implementation in the Xerces2-j 2.7.0 is used to load, save, and filter an example XML document. JAXP 1.3 also includes a reference implementation of the DOM 3.0 Load and Save API. JAXP 1.3 is included in JDK 5.0. In this article I have shown you how to load an XML document (with schema validation), save an XML document or a node to a file or a String, and filter nodes from an XML document.
Additional Reading
- W3C DOM 3 - the W3C DOM Level 3 Specification
- Xerces2 Java Parser 2.7.1 - download site for the Xerces 2j Parser
- Xerces2 DOM - programming with DOM
- Xerces2 Java Parser 2.7.1 DOM Level 3 - Xerces 2j DOM Level 3 implementation
Deepak Vohra is a NuBean consultant and web developer. He is a Sun Certified Java 1.4 Programmer and Sun Certified Web Component Developer for J2EE.