Load, Save and Filter XML Documents Using the DOM Level 3 API
by Deepak Vohra
07/19/2006
Saving an XML Document
I'll now turn to saving an XML document. With the DOM Level 3 API, an XML document may be saved to an XML file or a
String. The DOM Level 3 API has the added feature of being able to serialize only a selected node in the document. The
LSSerializer interface is used to save an XML document. The following code is the standard way in which to retrieve a DOM implementation, which can then be used to save an XML document. As in the previous section, import the
org.w3c.dom.ls package:
import org.w3c.dom.ls.*;
Create an XML document to output with a
LSSerializer. Add elements and attributes to the XML document. As in the previous section, set the system property
DOMImplementationRegistry.PROPERTY:
System.setProperty(DOMImplementationRegistry.PROPERTY,
"org.apache.xerces.dom.DOMImplementationSourceImpl");
Create a
DOMImplementationRegistry object:
DOMImplementationRegistry registry =
DOMImplementationRegistry.newInstance();
Obtain a
DOMImplementation object from the
DOMImplementationRegistry:
DOMImplementation domImpl =
registry.getDOMImplementation("LS 3.0");
Cast the
DOMImplementation instance to
DOMImplementationLS:
DOMImplementationLS implLS = (DOMImplementationLS)domImpl;
Create a
LSSerializer from the
DOMImplementationLS:
LSSerializer dom3Writer = implLS.createLSSerializer();
Create a
LSOutput object:
LSOutput output=implLS.createLSOutput();
Create the output directory
C:/output. Specify the
OutputStream and encoding for the
LSOutput object:
OutputStream outputStream =
new FileOutputStream(new File("c:/output/output.xml"));
output.setByteStream(outputStream);
output.setEncoding("UTF-8");
Output the XML document:
dom3Writer.write(document,output);
The XML document is output as shown in this code listing:
<?xml version="1.0" encoding="UTF-8"?>
<catalog publisher="dev2dev">
<journal edition="January-February2005"section="XML">
<article>
<title></title>
</article>
</journal>
</catalog>
The DOM3 specification has the feature to output a selected node in a DOM document instead of the complete document. For example, say the
journal Element node is required to be saved:
outputStream =
new FileOutputStream(new File("c:/output/node.xml"));
output.setByteStream(outputStream);
dom3Writer.write(journal,output);
Only the
journal node in the XML document gets output as shown in this code sample:
<?xml version="1.0" encoding="UTF-8"?>
<journal edition="January-February 2005" section="XML">
<article>
<title></title>
</article>
</journal>
With the DOM Level 3 API, an XML document may be output to a
String. Simply use the
writeToString() method:
String nodeString = dom3Writer.writeToString(journal);
You can find
DOM3Writer.java, the Java class used to output an XML document, in the download section.
Filtering an XML Document
An XML developer may be interested in filtering an XML document as the document is parsed or as the document is stored. In this section, I filter an XML document. I filter an input document with an input filter and save the parsed document with an output filter. In filtering an XML document, some of the nodes may be removed from the document. The
LSParserFilter interface allows filtering of the input, while the
LSSerializerFilter interface allows filtering of the output.
As in the loading and saving sections, import the DOM 3
org.w3c.dom.ls package:
import org.w3c.dom.ls.*;
Create a
LSParser implementation and a
LSParser parser as outlined in the load section. Create a
LSInput object and set an
InputStream for the
LSInput:
LSInput input = impl.createLSInput();
InputStream inputStream =
new FileInputStream(new File("C:/input/catalog.xml"));
input.setByteStream(inputStream);
Now I'm ready to create an input filter. In the input filter, I print out the Element nodes as they are parsed without filtering any nodes. To do this, I define a filter class that implements the
LSParserFilter interface, and implement the
acceptNode(),
getWhatToShow(), and
startElement() methods of the
LSParserFilter interface. I'll describe what these methods do.
The
acceptNode() method returns a short that indicates if a node is to be accepted, rejected, or skipped. The different values that may be returned by the
acceptNode() method are listed in Table 1:
| FILTER_ACCEPT | Accept the node |
| FILTER_INTERRUPT | Interrupt document filtering |
| FILTER_REJECT | Reject the node |
| FILTER_SKIP | Skip the node |
Table 1. Return values for the
acceptNode() method
If a node is accepted with
FILTER_ACCEPT, then the node is included in the
Document object returned by the parser. If a node is skipped with
FILTER_SKIP, only the specified node is skipped; the children of the node are parsed and included in the DOM document. If a node is rejected with
FILTER_REJECT, the node and its children are rejected. Note that the
acceptNode() method received a fully parsed node (including its descendants), which you can then accept/reject as just described. If you like, you can modify this node, by adding children, for example.
The
getWhatToShow() method specifies the nodes that will be input ("shown") to the
acceptNode() method. In other words, the
acceptNode() is itself filtered! Nodes not marked by the
getWhatToShow() method are automatically included in the DOM document being built, without filtering. Nodes marked by the
getWhatToShow() method will be passed to the
acceptNode() method for acceptance/rejection or skipping. Table 2 lists the return values for the
getWhatToShow() method.
| NodeFilter.SHOW_ALL | Show all nodes |
| NodeFilter.SHOW_ELEMENT | Show Element nodes |
| NodeFilter.SHOW_TEXT | Show Text nodes |
| NodeFilter.SHOW_COMMENT | Show Comment nodes |
| NodeFilter.SHOW_PROCESSING_INSTRUCTION | Show ProcessingInstruction nodes |
| NodeFilter.SHOW_CDATA_SECTION | Show CDATASection section nodes |
| NodeFilter.SHOW_ENTITY_REFERENCE | Show EntityReference nodes |
Table 2. The return values for the
getWhatToShow() method