Java API for XML Processing (JAXP) Tutorial

This chapter presents the Document Object Model (DOM). A DOM is a standard tree structure, where each node contains one of the components from an XML structure. The two most common types of nodes are element nodes and text nodes. Using DOM functions lets you create nodes, remove nodes, change their contents, and traverse the node hierarchy.

The examples in this chapter demonstrate how to parse an existing XML file to construct a DOM, display and inspect the DOM hierarchy, and explore the syntax of namespaces. It also shows how to create a DOM from scratch, and see how to use some of the implementation-specific features in Sun's JAXP implementation to convert an existing data set to XML.

When to Use DOM
Documents Versus Data
Mixed-Content Model
Types of Nodes
A Simpler Model
Increasing the Complexity
Choosing Your Model
Reading XML Data into a DOM
Creating the Program
Create the Skeleton
Import the Required Classes
Handle Errors
Instantiate the Factory
Get a Parser and Parse the File
Configuring the Factory
Handling Validation Errors
Displaying the DOM Nodes
Obtaining Node Type Information
Lexical Controls
Printing DOM Tree Nodes
Node Operations
Creating Nodes
Traversing Nodes
Searching for Nodes
Obtaining Node Content
Creating Attributes
Removing and Changing Nodes
Inserting Nodes
Running the DOMEcho Sample
Validating With XML Schema
Overview of the Validation Process
Configuring the DocumentBuilder Factory
Associating a Document with a Schema
Validating with Multiple Namespaces
Declaring the Schemas in the XML Data Set
Declaring the Schemas in the Application
Running the DOMEcho Sample With Schema Validation
Further Information

When to Use DOM

The Document Object Model standard is, above all, designed for documents (for example, articles and books). In addition, the JAXP 1.4.2 implementation supports XML Schema, something that can be an important consideration for any given application.

On the other hand, if you are dealing with simple data structures and if XML Schema is not a big part of your plans, then you may find that one of the more object-oriented standards, such as JDOM or dom4j, is better suited for your purpose.

From the start, DOM was intended to be language-neutral. Because it was designed for use with languages such as C and Perl, DOM does not take advantage of Java's object-oriented features. That fact, in addition to the distinction between documents and data, also helps to account for the ways in which processing a DOM differs from processing a JDOM or dom4j structure.

In this section, we will examine the differences between the models underlying those standards to help you choose the one that is most appropriate for your application.

Documents Versus Data

The major point of departure between the document model used in DOM and the data model used in JDOM or dom4j lies in:

The kind of node that exists in the hierarchy
The capacity for mixed content

It is the difference in what constitutes a "node" in the data hierarchy that primarily accounts for the differences in programming with these two models. However, the capacity for mixed content, more than anything else, accounts for the difference in how the standards define a node. So we start by examining DOM's mixed-content model.

Mixed-Content Model

Text and elements can be freely intermixed in a DOM hierarchy. That kind of structure is called mixed content in the DOM model.

Mixed content occurs frequently in documents. For example, suppose you wanted to represent this structure:

<sentence>This is an <bold>important</bold> idea.</sentence>

The hierarchy of DOM nodes would look something like this, where each line represents one node:

Node	nodeName	nodeValue	Attributes
`Attr`	Name of attribute	Value of attribute	null
`CDATASection`	`#cdata-section`	Content of the CDATA section	null
`Comment`	`#comment`	Content of the comment	null
`Document`	`#document`	null	null
`DocumentFragment`	`#documentFragment`	null	null
`DocumentType`	Document Type name	null	null
`Element`	Tag name	null	null
`Entity`	Entity name	null	null
`EntityReference`	Name of entity referenced	null	null
`Notation`	Notation name	null	null
`ProcessingInstruction`	Target	Entire content excluding the target	null
`Text`	`#text`	Content of the text node	null

API	Preserve Lexical Info	Focus on Content
`setCoalescing()`	False	True
`setExpandEntityReferences()`	False	True
`setIgnoringComments()`	False	True
`setIgnoringElementContent` `Whitespace()`	False	True

Java API for XML Processing (JAXP) Tutorial

When to Use DOM

Documents Versus Data

Mixed-Content Model

Types of Nodes

A Simpler Model

Increasing the Complexity

Choosing Your Model

Reading XML Data into a DOM

Creating the Program

Create the Skeleton

Import the Required Classes

Handle Errors

Instantiate the Factory

Get a Parser and Parse the File

Configuring the Factory

Handling Validation Errors

Displaying the DOM Nodes

Obtaining Node Type Information

Lexical Controls

Printing DOM Tree Nodes

Node Operations

Running the DOMEcho Sample

Validating With XML Schema

Overview of the Validation Process

Configuring the DocumentBuilder Factory

Associating a Document with a Schema

Validating with Multiple Namespaces

Declaring the Schemas in the XML Data Set

Declaring the Schemas in the Application

Running the DOMEcho Sample With Schema Validation

Further Information

Running the `DOMEcho` Sample

Configuring the `DocumentBuilder` Factory

Running the `DOMEcho` Sample With Schema Validation