Load, Save and Filter XML Documents Using the DOM Level 3 API

by Deepak Vohra
07/19/2006

Saving an XML Document

I'll now turn to saving an XML document. With the DOM Level 3 API, an XML document may be saved to an XML file or a String. The DOM Level 3 API has the added feature of being able to serialize only a selected node in the document. The LSSerializer interface is used to save an XML document. The following code is the standard way in which to retrieve a DOM implementation, which can then be used to save an XML document. As in the previous section, import the org.w3c.dom.ls package:


import org.w3c.dom.ls.*;

Create an XML document to output with a LSSerializer. Add elements and attributes to the XML document. As in the previous section, set the system property DOMImplementationRegistry.PROPERTY:


System.setProperty(DOMImplementationRegistry.PROPERTY,
            "org.apache.xerces.dom.DOMImplementationSourceImpl");

Create a DOMImplementationRegistry object:


DOMImplementationRegistry registry =
                         DOMImplementationRegistry.newInstance();

Obtain a DOMImplementation object from the DOMImplementationRegistry:


DOMImplementation domImpl =
                         registry.getDOMImplementation("LS 3.0");

Cast the DOMImplementation instance to DOMImplementationLS:


DOMImplementationLS implLS = (DOMImplementationLS)domImpl;

Create a LSSerializer from the DOMImplementationLS:


LSSerializer dom3Writer = implLS.createLSSerializer();

Create a LSOutput object:


LSOutput output=implLS.createLSOutput();

Create the output directory C:/output. Specify the OutputStream and encoding for the LSOutput object:

Copy


OutputStream outputStream = 
        new FileOutputStream(new File("c:/output/output.xml"));
output.setByteStream(outputStream);
output.setEncoding("UTF-8");

Output the XML document:


dom3Writer.write(document,output);

The XML document is output as shown in this code listing:

Copy


<?xml version="1.0" encoding="UTF-8"?>
<catalog publisher="dev2dev">
 <journal edition="January-February2005"section="XML">
  <article>
   <title></title>
  </article>
 </journal>
</catalog>

The DOM3 specification has the feature to output a selected node in a DOM document instead of the complete document. For example, say the journal Element node is required to be saved:

Copy


 outputStream = 
        new FileOutputStream(new File("c:/output/node.xml"));
 output.setByteStream(outputStream);
 dom3Writer.write(journal,output);

Only the journal node in the XML document gets output as shown in this code sample:

Copy


<?xml version="1.0" encoding="UTF-8"?>
<journal edition="January-February 2005" section="XML">
 <article>
  <title></title>
 </article>
</journal>

With the DOM Level 3 API, an XML document may be output to a String. Simply use the writeToString() method:


String nodeString = dom3Writer.writeToString(journal);

You can find DOM3Writer.java, the Java class used to output an XML document, in the download section.

Filtering an XML Document

An XML developer may be interested in filtering an XML document as the document is parsed or as the document is stored. In this section, I filter an XML document. I filter an input document with an input filter and save the parsed document with an output filter. In filtering an XML document, some of the nodes may be removed from the document. The LSParserFilter interface allows filtering of the input, while the LSSerializerFilter interface allows filtering of the output.

As in the loading and saving sections, import the DOM 3 org.w3c.dom.ls package:


import org.w3c.dom.ls.*;

Create a LSParser implementation and a LSParser parser as outlined in the load section. Create a LSInput object and set an InputStream for the LSInput:

Copy


 LSInput input = impl.createLSInput();
 InputStream inputStream = 
         new FileInputStream(new File("C:/input/catalog.xml"));
 input.setByteStream(inputStream);

Now I'm ready to create an input filter. In the input filter, I print out the Element nodes as they are parsed without filtering any nodes. To do this, I define a filter class that implements the LSParserFilter interface, and implement the acceptNode(), getWhatToShow(), and startElement() methods of the LSParserFilter interface. I'll describe what these methods do.

The acceptNode() method returns a short that indicates if a node is to be accepted, rejected, or skipped. The different values that may be returned by the acceptNode() method are listed in Table 1:

FILTER_ACCEPT	Accept the node
FILTER_INTERRUPT	Interrupt document filtering
FILTER_REJECT	Reject the node
FILTER_SKIP	Skip the node

Table 1. Return values for the acceptNode() method

If a node is accepted with FILTER_ACCEPT, then the node is included in the Document object returned by the parser. If a node is skipped with FILTER_SKIP, only the specified node is skipped; the children of the node are parsed and included in the DOM document. If a node is rejected with FILTER_REJECT, the node and its children are rejected. Note that the acceptNode() method received a fully parsed node (including its descendants), which you can then accept/reject as just described. If you like, you can modify this node, by adding children, for example.

The getWhatToShow() method specifies the nodes that will be input ("shown") to the acceptNode() method. In other words, the acceptNode() is itself filtered! Nodes not marked by the getWhatToShow() method are automatically included in the DOM document being built, without filtering. Nodes marked by the getWhatToShow() method will be passed to the acceptNode() method for acceptance/rejection or skipping. Table 2 lists the return values for the getWhatToShow() method.

NodeFilter.SHOW_ALL	Show all nodes
NodeFilter.SHOW_ELEMENT	Show Element nodes
NodeFilter.SHOW_TEXT	Show Text nodes
NodeFilter.SHOW_COMMENT	Show Comment nodes
NodeFilter.SHOW_PROCESSING_INSTRUCTION	Show ProcessingInstruction nodes
NodeFilter.SHOW_CDATA_SECTION	Show CDATASection section nodes
NodeFilter.SHOW_ENTITY_REFERENCE	Show EntityReference nodes

Table 2. The return values for the getWhatToShow() method