|
Technical Note
Loading and Saving an XML Document with DOM 3.0
Author: Deepak Vohra (Web Developer and Sun Certified Java 1.4 programmer)
Publication Date: February 2005
The W3C's Document Object Model (DOM) Level 1 and DOM Level 2 (The org.w3c.dom API is DOM Level 2 API) specifications do not have a standard method for loading and saving an XML document. Instead, most developers use vendor specific code for input to and output from DOM parsers, which is a disadvantage when portability is a requirement.
The Oracle 10g XML Developer's Kit (XDK) does however provide a DOM 3.0 Load and Save (LS) API that standardizes the loading and saving of XML documents, enabling developers to build portable DOM XML applications.
This API has several advantages over the DOM 1.0 & DOM 2.0:
For loading:
- Asynchronous loading. The loading process of an XML document generates an LSLoadEvent event, which indicates the XML document has been parsed.
- DOM 3.0 LS has the provision to replace a node in the XML document loaded with a node from another XML document. An example is provided in the "Loading a Document" section.
- DOM 3.0 LS filter classes may be used to filter the parsed nodes.
For saving:
- A DOM document or a document node may be output to a java.lang.String.
- DOM 3.0 LS has the provision to save only a selected node in an XML document.
- A DOM document may be filtered to remove some of the nodes from the document by setting a filter on the output.
This Technical Note will explain the procedure for loading and saving an XML document with the DOM 3.0 LS API. We will load an example XML document, save the XML document and filter the document with DOM 3 API classes.
Overview
The DOM 3.0 LS API is based on the DOM Level 3 LS specification, which is implemented in the org.w3c.dom.ls package. In this spec the LSParser class is used to load an XML document and the LSSerializer class is used to save an XML document; DOM 3.0 supports XML document filtering with the LSParserFilter and LSSerializerFilter classes.
In the following example we will load an XML document (OracleCatalog.xml; see below) using the LSParser class.
<?xml version="1.0" encoding="UTF-8"?>
<!--A Oracle Magazine Catalog-->
<catalog publisher="Oracle Publishing" title="Oracle Magazine">
<journal date="November-December 2003">
<article section="XML">
<title>Updating XQuery</title>
<author>Jason Hunter</author>
</article>
</journal>
<journal date="September-October 2003">
<article section="SQL">
<title>The Active Database</title>
<author> Cameron ORourke</author>
</article>
</journal>
</catalog>
Preliminary Setup
To load and save an XML document with the DOM 3.0 LS, the org.w3c.dom.ls package classes are required in the Classpath. Install the Oracle XDK xdk_nt_10_1_0_2_0_production.zip file to a directory. Add <XDK>/lib/xmlparserv2.jar to the Classpath; <XDK> is the directory in which the XDK is installed.
Loading a Document
The LSParser class has methods to parse an XML document and build a DOM document structure. Import the DOM 3.0 LS package.
import org. w3c.dom.ls.*;
The DOMImplementationLS interface is used to create a LSParser object. Create a DOMImplementationLS reference variable.
DOMImplementationLS impl = new XMLDOMImplementation();
Create a LSParser object.
LSParser parser = impl.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, null);
The LSParser may be set to mode MODE_SYNCHRONOUS or MODE_ASYNCHRONOUS. If the LSParser is set to the former the parse and parseURI methods of the LSParser object return an object of type org.w3c.dom.Document. If the LSParser is set to the latter the methods return null as the document object may not yet be built when the parse or parseURI method returns.
Now create an XMLLSParser object from the LSParser object and register an EventListener with the XMLLSParser object. The XMLLSParser class implements the LSParser interface. An XMLLSParser object supports the LSLoadEvent event that is generated after an XML document has been parsed by the XMLLSParser object.
XMLLSParser lsParser=(XMLLSParser)parser;
lsParser.addEventListener(
"ls-load", (EventListener)(new DOM3Builder()), true);
Parse the XML document with the parse or parseURI methods.
Document document=lsParser.parseURI("file://c:/DOM3.0/OracleCatalog.xml");
A LSLoadEvent event is generated that is registered by the handleEvent method of the EventListener. In the handleEvent method the XML document that generated the LSLoadEvent event may be retrieved.
public void handleEvent( Event event)
{
if(event instanceof LSLoadEvent){
LSLoadEvent loadEvent=(LSLoadEvent)event;
Document document=loadEvent.getNewDocument();
System.out.println("Document with root element
"+document.getDocumentElement().getTagName()+ " has been
loaded.");
}
}
You may print a message to notify that the XML document has been loaded:
XML Document with root element catalog has been loaded.
You would use the parseWithContext method to replace a node in the XML document loaded with a node from another XML document. As an example, let's replace the 'journal' node for 'November-December 2003' in OracleCatalog.xml with the 'journal' node in the catalog.xml document below:
<?xml version="1.0" encoding="UTF-8"?>
<!--A Oracle Magazine Catalog-->
<journal date="September-October 2003">
<article section="XML">
<title>Parsing XML Efficiently</title>
<author>Julie Basu</author>
</article>
</journal>
First, select the node in the XML document OracleCatalog.xml to be replaced with an XPath expression.
Node node=((XMLDocument)(document)).selectSingleNode("/catalog/journal[@date='November-December 2003']");
Create a LSInput object for the XML document from which the replacement node is obtained.
LSInput lsInput=impl.createLSInput();
URL url=new URL("file://c:/DOM3.0/catalog.xml");
lsInput.setSystemId(url.toString());
Now replace the selected journal node in OracleCatalog.xml with the journal node in catalog.xml.
lsParser.parseWithContext(lsInput, node , LSParser.ACTION_REPLACE);
DOM3Builder.java, the program we used to load and replace the XML document, is available in the support files.
Saving a Document
The LSSerializer interface is used to save an XML document. Import the org.w3c.dom.ls package:
import org.w3c.dom.ls.*;
The XMLDOMImplementation class implements the DOMImplementationLS interface, and has methods to create an XML document and doctype. Now create a XMLDOMImplementation object.
XMLDOMImplementation impl = new XMLDOMImplementation();
Create a doctype for a document.
DocumentType doctype = impl.createDocumentType(
"catalog", null,
"file://c:/Oracle Magazine/catalog.dtd");
Create an org.w3c.dom.Document object:
Document document = impl.createDocument( null, null, doctype);
Create elements in the XML document:
Element catalog = document.createElement("catalog");
catalog.setAttribute("title", "Oracle Magazine");
document.appendChild(catalog);
Element journal = document.createElement("journal");
journal.setAttribute("date", "January-February 2004");
journal.setAttribute("section", "SQL");
catalog.appendChild(journal);
Create an LSSerializer object:
DOMImplementationLS implls=(DOMImplementationLS)impl;
LSSerializer domWriter = implls.createLSSerializer();
The LSSerializer interface has methods write and writeToString to output a document to an OutputStream or string.
Create a directory C:/output to output the XML
document. Now save the XML document you built to an output file:
LSOutput output=implls.createLSOutput();
OutputStream outputStream=new FileOutputStream(new File("c:/output/output.xml"));
output.setByteStream(outputStream);
output.setEncoding("UTF-8");
domWriter.write(document, output);
The document is written to the output file, output.xml.
<!DOCTYPE catalog SYSTEM "file://c:/Oracle Magazine/catalog.dtd">
<catalog title="Oracle Magazine">
<journal date="January-February 2004" section="SQL"/>
</catalog>
If you want to save only a node, not the complete document, use the LSSerializer class. As an example, let's save only the journal node in the created document:
Node journalNode=(Node)catalog;
outputStream=new FileOutputStream(new File("c:/output/nodeOutput.xml"));
output.setByteStream(outputStream);
domWriter.write(journalNode,output);
The journal node is written to the output file nodeOutput.xml:
<journal date="January-February 2004" section="SQL"/>
The document or a node in the document may also be written to a string with the writeToString method:
String nodeString=domWriter.writeToString(journalNode);
DOM3Writer.java, the program we used to save a document to an OutputStream, is available in the support files.
Filtering a Document
Now let's assume you want to load an XML document by selecting nodes from the input, and save an XML document by selecting nodes from the XML document loaded. The DOM 3.0 LS provides filter classes to filter input from an XML file and filter output to an XML file. The LSParserFilter interface is used to filter input to a LSParser object, while the LSSerializerFilter interface is used to filter output from a LSSerializer object.
In this section, we'll set filters for the input as well as the output. First, import the DOM 3.0 LS package:
import org.w3c.dom.ls.*;
Create an LSParser object and set an input filter on it with the setFilter method.
DOMImplementationLS impl = new XMLDOMImplementation();
LSParser parser = impl.createLSParser(
DOMImplementationLS.MODE_SYNCHRONOUS,null);
LoadFilter loadFilter=new LoadFilter();
parser.setFilter(loadFilter);
LoadFilter is a implementation class that implements the LSParserFilter interface. Implement the methods acceptNode, getWhatToShow, and startElement from the LSParserFilter interface in the LoadFilter class.
The getWhatToShow method returns a value to indicate which of the nodes are shown to the filter. For example, if the value returned is NodeFilter.SHOW_ELEMENT, only the element nodes are shown to the filter.
The acceptNode method is called after a node has been parsed. This method returns a value to indicate if the node is to be accepted, rejected, skipped by the filter, or if the parsing of the document is to be interrupted. For example, if the value returned is NodeFilter.FILTER_REJECT the node is filtered out. The startElement method is called at the start of each element.
As an example, select the element nodes to show to the filter, and accept all of the input nodes. The element nodes do not include the text nodes in the elements, but they include the attribute nodes in the elements.
In the startElement method print the node that is being parsed:
private class LoadFilter implements LSParserFilter{
public short acceptNode(Node node){
return NodeFilter.FILTER_ACCEPT;
}
public int getWhatToShow(){
return NodeFilter.SHOW_ELEMENT;
}
public short startElement(Element element){
System.out.println("Element Parsed "+ element.getTagName());
return NodeFilter.FILTER_ACCEPT;
}
}
Load the XML document in OracleCatalog.xml with the LSParser object.
Document document=parser.parseURI(
"file://c:/ DOM3.0/OracleCatalog.xml");
The document loaded consists of all the element nodes
and the attribute nodes in the element nodes. The startElement displays the elements that are parsed:
Element Parsed catalog
Element Parsed journal
Element Parsed article
Element Parsed title
Element Parsed author
Element Parsed journal
Element Parsed article
Element Parsed title
Element Parsed author
To filter the output to an XML file, create a LSSerializer and set a output filter on the LSSerializer object with the setFilter method.
LSSerializer domWriter = impl.createLSSerializer();
SaveFilter saveFilter = new SaveFilter();
domWriter.setFilter(saveFilter);
SaveFilter is a class that implements the LSSerializerFilter interface. Now we will implement the methods getWhatToShow and acceptNode from the LSSerializerFilter interface in the SaveFilter class.
First select all the element nodes in the XML document loaded to show to the filter and reject the journal node with date attribute 'November-December 2003' from output from the filter. In the SaveFilter class getWhatToShow method, the NodeFilter.SHOW_ELEMENT value shows only the element nodes and element attributes to the filter, not the text nodes in the elements:
private class SaveFilter implements LSSerializerFilter{
public short acceptNode(Node node){
Element element=(Element)node;
if(element.getTagName().equals("journal"))
if(element.getAttribute("date").equals("November-December 2003"))
return NodeFilter.FILTER_REJECT;
return NodeFilter.FILTER_ACCEPT;
}
public int getWhatToShow(){
return NodeFilter.SHOW_ELEMENT;
}
}
Save the loaded XML document with the LSSerializer object:
LSOutput lsOutput=impl.createLSOutput();
OutputStream outputStream=new FileOutputStream(new File("c:/output/output-filter.xml"));
lsOutput.setByteStream(outputStream);
domWriter.write( document, lsOutput);
The filtered XML document is saved in output-filter.xml:
<?xml version = '1.0' encoding = 'UTF-8'?>
<catalog publisher="Oracle Publishing" title="Oracle Magazine">
<journal date="September-October 2003">
<article section="SQL">
<title></title><author></author>
</article>
</journal>
</catalog>
The selected journal node is consequently filtered out of the document. The text in the element nodes is not output, because NodeFilter.SHOW_ELEMENT does not include text nodes.
DOM3Filter.java, the program used to filter input to an LSParser and filter output from an LSSerializer, is available in the support files.
Congratulations; you have learned how to load and save XML documents using the Oracle 10g XDK!
Next Steps:
Visit the XML Technology Center
|