The Java API for XML Processing (JAXP) is for processing XML data using applications written in the Java programming language. JAXP leverages the parser standards Simple API for XML Parsing (SAX) and Document Object Model (DOM) so that you can choose to parse your data as a stream of events or to build an object representation of it. JAXP also supports the Extensible Stylesheet Language Transformations (XSLT) standard, giving you control over the presentation of the data and enabling you to convert the data to other XML documents or to other formats, such as HTML. JAXP also provides namespace support, allowing you to work with DTDs that might otherwise have naming conflicts. Finally, as of version 1.4, JAXP implements the Streaming API for XML (StAX) standard.
Designed to be flexible, JAXP allows you to use any XML-compliant parser from within your application. It does this with what is called a pluggability layer, which lets you plug in an implementation of the SAX or DOM API. The pluggability layer also allows you to plug in an XSL processor, letting you control how your XML data is displayed.
The main JAXP APIs are defined in the
javax.xml.parsers package. That package contains vendor-neutral factory classes,
TransformerFactory, which give you a
DocumentBuilder, and an XSLT transformer, respectively.
DocumentBuilder, in turn, creates a DOM-compliant
The factory APIs let you plug in an XML implementation offered by another vendor without changing your source code. The implementation you get depends on the setting of the
javax.xml.transform.TransformerFactory system properties, using
System.setProperties() in the code,
<sysproperty key="..." value="..."/> in an Ant build script, or
-DpropertyName="..." on the command line. The default values (unless overridden at runtime on the command line or in the code) point to Sun's implementation.
The SAX and DOM APIs are defined by the XML-DEV group and by the W3C, respectively. The libraries that define those APIs are as follows:
javax.xml.parsers: The JAXP APIs, which provide a common interface for different vendors' SAX and DOM parsers.
org.w3c.dom: Defines the
Documentclass (a DOM) as well as classes for all the components of a DOM.
org.xml.sax: Defines the basic SAX APIs.
javax.xml.transform: Defines the XSLT APIs that let you transform XML into other forms.
javax.xml.stream: Provides StAX-specific transformation APIs.
The Simple API for XML (SAX) is the event-driven, serial-access mechanism that does element-by-element processing. The API for this level reads and writes XML to a data repository or the web. For server-side and high-performance applications, you will want to fully understand this level. But for many applications, a minimal understanding will suffice.
The DOM API is generally an easier API to use. It provides a familiar tree structure of objects. You can use the DOM API to manipulate the hierarchy of application objects it encapsulates. The DOM API is ideal for interactive applications because the entire object model is present in memory, where it can be accessed and manipulated by the user.
On the other hand, constructing the DOM requires reading the entire XML structure and holding the object tree in memory, so it is much more CPU- and memory-intensive. For that reason, the SAX API tends to be preferred for server-side applications and data filters that do not require an in-memory representation of the data.
The XSLT APIs defined in
javax.xml.transform let you write XML data to a file or convert it into other forms. As shown in the XSLT section of this tutorial, you can even use it in conjunction with the SAX APIs to convert legacy data to XML.
Finally, the StAX APIs defined in
javax.xml.stream provide a streaming Java technology-based, event-driven, pull-parsing API for reading and writing XML documents. StAX offers a simpler programming model than SAX and more efficient memory management than DOM.
The basic outline of the SAX parsing APIs is shown in Figure 1-1. To start the process, an instance of the
SAXParserFactory class is used to generate an instance of the parser.
Figure 1-1 SAX APIs
The parser wraps a
SAXReader object. When the parser's
parse() method is invoked, the reader invokes one of several callback methods implemented in the application. Those methods are defined by the interfaces
Here is a summary of the key SAX APIs:
SAXParserFactory object creates an instance of the parser determined by the system property,
SAXParser interface defines several kinds of
parse() methods. In general, you pass an XML data source and a
DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object.
SAXParser wraps a
SAXReader. Typically, you do not care about that, but every once in a while you need to get hold of it using
getXMLReader() so that you can configure it. It is the
SAXReader that carries on the conversation with the SAX event handlers you define.
Not shown in the diagram, a
DefaultHandler implements the
EntityResolver interfaces (with null methods), so you can override only the ones you are interested in.
Methods such as
endElement are invoked when an XML tag is recognized. This interface also defines the methods
processingInstruction(), which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively.
warning() are invoked in response to various parsing errors. The default error handler throws an exception for fatal errors and ignores other errors (including validation errors). This is one reason you need to know something about the SAX parser, even if you are using the DOM. Sometimes, the application may be able to recover from a validation error. Other times, it may need to generate an exception. To ensure the correct handling, you will need to supply your own error handler to the parser.
Defines methods you will generally never be called upon to use. Used when processing a DTD to recognize and act on declarations for an unparsed entity.
resolveEntity method is invoked when the parser must identify data identified by a URI. In most cases, a URI is simply a URL, which specifies the location of a document, but in some cases the document may be identified by a URN - a public identifier, or name, that is unique in the web space. The public identifier may be specified in addition to the URL. The
EntityResolver can then use the public identifier instead of the URL to find the document-for example, to access a local copy of the document if one exists.
A typical application implements most of the
ContentHandler methods, at a minimum. Because the default implementations of the interfaces ignore all inputs except for fatal errors, a robust implementation may also want to implement the
The SAX parser is defined in the packages listed in Table 1-1.
Table 1-1 SAX Packages
||Defines the SAX interfaces. The name
||Defines SAX extensions that are used for doing more sophisticated SAX processing-for example, to process a document type definition (DTD) or to see the detailed syntax for a file.|
||Contains helper classes that make it easier to use SAX-for example, by defining a default handler that has null methods for all the interfaces, so that you only need to override the ones you actually want to implement.|
Figure 1-2 shows the DOM APIs in action.
Figure 1-2 DOM APIs
You use the
javax.xml.parsers.DocumentBuilderFactory class to get a
DocumentBuilder instance, and you use that instance to produce a
Document object that conforms to the DOM specification. The builder you get, in fact, is determined by the system property
javax.xml.parsers.DocumentBuilderFactory, which selects the factory implementation that is used to produce the builder. (The platform's default value can be overridden from the command line.)
You can also use the
newDocument() method to create an empty
Document that implements the
org.w3c.dom.Document interface. Alternatively, you can use one of the builder's parse methods to create a
Document from existing XML data. The result is a DOM tree like that shown in Figure 1-2.
Note - Although they are called objects, the entries in the DOM tree are actually fairly low-level data structures. For example, consider this structure:
<color>blue</color>. There is an element node for the color tag, and under that there is a text node that contains the data, blue! This issue will be explored at length in the DOM chapter of this tutorial, but developers who are expecting objects are usually surprised to find that invoking
getNodeValue() on the element node returns nothing. For a truly object-oriented tree, see the JDOM API at http://www.jdom.org.
The Document Object Model implementation is defined in the packages listed in Table 1-2.
Table 1-2 DOM Packages
||Defines the DOM programming interfaces for XML (and, optionally, HTML) documents, as specified by the W3C.|
Figure 1-3 XSLT APIs
TransformerFactory object is instantiated and used to create a
Transformer. The source object is the input to the transformation process. A source object can be created from a SAX reader, from a DOM, or from an input stream.
Similarly, the result object is the result of the transformation process. That object can be a SAX event handler, a DOM, or an output stream.
When the transformer is created, it can be created from a set of transformation instructions, in which case the specified transformations are carried out. If it is created without any specific instructions, then the transformer object simply copies the source to the result.
The XSLT APIs are defined in the packages shown in Table 1-3.
Table 1-3 XSLT Packages
||Classes to create input (source) and output (result) objects from a DOM.|
||Classes to create input (source) objects from a SAX parser and output (result) objects from a SAX event handler.|
||Classes to create input (source) objects and output (result) objects from an I/O stream.|
StAX is the latest API in the JAXP family, and provides an alternative to SAX, DOM, TrAX, and DOM for developers looking to do high-performance stream filtering, processing, and modification, particularly with low memory and limited extensibility requirements.
To summarize, StAX provides a standard, bidirectional pull parser interface for streaming XML processing, offering a simpler programming model than SAX and more efficient memory management than DOM. StAX enables developers to parse and modify XML streams as events, and to extend XML information models to allow application-specific additions. More detailed comparisons of StAX with several alternative APIs are provided in Chapter 5, Streaming API for XML, in Comparing StAX to Other JAXP APIs.
The StAX APIs are defined in the packages shown in Table 1-4.
Table 1-4 StAX Packages
||Provides StAX-specific transformation APIs.|
A set of JAXP sample programs is provided in the JAXP download bundle. After you install JAXP, the sample programs are found in the directory INSTALL_DIR
The sample programs are intended to be run on the Java Platform, Standard Edition (Java SE) version 6.
At this point, you have enough information to begin picking your own way through the JAXP libraries. Your next step depends on what you want to accomplish. You might want to go to any of these chapters: