XML Schema: Understanding Namespaces

Developer: XML

by Rahul Srivastava

Moving to XML Schema? This introduction to namespaces will help you understand one of its more important components.

Declaring and Applying Namespaces

Namespaces are declared as an attribute of an element. It is not mandatory to declare namespaces only at the root element; rather it could be declared at any element in the XML document. The scope of a declared namespace begins at the element where it is declared and applies to the entire content of that element, unless overridden by another namespace declaration with the same prefix namewhere, the content of an element is the content between the <opening-tag> and </closing-tag> of that element. A namespace is declared as follows:

<someElement xmlns:pfx="http://www.foo.com" />

In the attribute xmlns:pfx, xmlns is like a reserved word, which is used only to declare a namespace. In other words, xmlns is used for binding namespaces, and is not itself bound to any namespace. Therefore, the above example is read as binding the prefix "pfx" with the namespace "http://www.foo.com."

It is a convention to use XSD or XS as a prefix for the XML Schema namespace, but that decision is purely personal. One can choose to use a prefix ABC for the XML Schema namespace, which is legal, but doesn't make much sense. Using meaningful namespace prefixes add clarity to the XML document. Note that the prefixes are used only as a placeholder and must be expanded by the namespace-aware XML parser to use the actual namespace bound to the prefix. In Java analogy, a namespace binding can be correlated to declaring a variable, and wherever the variable is referenced, it is replaced by the value it was assigned.

In our previous namespace declaration example, wherever the prefix "pfx" is referenced within the namespace declaration scope, it is expanded to the actual namespace( http://www.foo.com) to which it was bound:

In Java: String pfx = "http://www.library.com"

In XML: <someElement xmlns:pfx="http://www.foo.com" />

Although a namespace usually looks like a URL, that doesn't mean that one must be connected to the Internet to actually declare and use namespaces. Rather, the namespace is intended to serve as a virtual "container" for vocabulary and un-displayed content that can be shared in the Internet space. In the Internet space URLs are uniquehence you would usually choose to use URLs to uniquely identify namespaces. Typing the namespace URL in a browser doesn't mean it would show all the elements and attributes in that namespace; it's just a concept.

But here's a twist: although the W3C Namespaces in XML Recommendation declares that the namespace name should be an IRI, it enforces no such constraint. Therefore, I could also use something like:

<someElement xmlns:pfx=" foo" />

which is perfectly legal.

By now it should be clear that to use a namespace, we first bind it with a prefix and then use that prefix wherever required. But why can't we use the namespaces to qualify the elements or attributes from the start? First, because namespacesbeing IRIsare quite long and thus would hopelessly clutter the XML document. Second and most important, because it might have a severe impact on the syntax, or to be specific, on the production rules of XMLthe reason being that an IRI might have characters that are not allowed in XML tags per the W3C XML 1.0 Recommendation .

Copy

  Invalid) <http://www.library.com:Book /> Valid) <lib:Book xmlns:lib="http://www.library.com" />

Below the elements Title and Author are associated with the Namespace http://www.library.com:

Copy

  <?xml version="1.0"?> <Book xmlns:lib="http://www.library.com"> <lib:Title>Sherlock Holmes</lib:Title> <lib:Author>Arthur Conan Doyle</lib:Author> </Book>

In the example below, the elements Title and Author of Sherlock Holmes - IIIand Sherlock Holmes - I are associated with the namespace http://www.library.com and the elements Title and Author of Sherlock Holmes - II are associated with the namespace http://www.otherlibrary.com.

Copy

  <?xml version="1.0"?> <Book xmlns:lib="http://www.library.com"> <lib:Title>Sherlock Holmes - I</lib:Title> <lib:Author>Arthur Conan Doyle</lib:Author> <purchase xmlns:lib="http://www.otherlibrary.com"> <lib:Title>Sherlock Holmes - II</lib:Title> <lib:Author>Arthur Conan Doyle</lib:Author> </purchase> <lib:Title>Sherlock Holmes - III</lib:Title> <lib:Author>Arthur Conan Doyle</lib:Author> </Book>

The W3C Namespaces in XML Recommendation enforces some namespace constraints:

1. Prefixes beginning with the three-letter sequence x, m, and l, in any case combination, are reserved for use by XML and XML-related specifications. Although not a fatal error, it is inadvisable to bind such prefixes. The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace.
2. A prefix cannot be used unless it is declared and bound to a namespace. (Ever tried to use a variable in Java without declaring it?)

The following violates both these constraints:

Copy

  <?xml version="1.0"?> <Book xmlns:XmlLibrary="http://www.library.com"> <lib:Title>Sherlock Holmes - I</lib:Title> <lib:Author>Arthur Conan Doyle</lib:Author> </Book>

[Error]: prefix lib not bound to a namespace.
[Inadvisable]: prefix XmlLibrary begins with 'Xml.'

Default Namespace (Not Default Namespaces)

It would be painful to repeatedly qualify an element or attribute you wish to use from a namespace. In such cases, you can declare a {default namespace} instead. Remember, at any point in time, there can be only one {default namespace} in existence. Therefore, the term "Default Namespaces" is inherently incorrect.

Declaring a {default namespace} means that any element within the scope of the {default namespace} declaration will be qualified implicitly, if it is not already qualified explicitly using a prefix. As with prefixed namespaces, a {default namespace} can be overridden too. A {default namespace} is declared as follows:

Copy

  <someElement xmlns="http://www.foo.com"/> <?xml version="1.0"?> <Book xmlns="http://www.library.com"> <Title>Sherlock Holmes</Title> <Author>Arthur Conan Doyle</Author> </Book>

In this case the elements Book, Title, and Author are associated with the Namespace http://www.library.com.

Remember, the scope of a namespace begins at the element where it is declared. Therefore, the element Book is also associated with the {default namespace}, as it has no prefix.

Copy

  <?xml version="1.0"?> <Book xmlns="http://www.library.com"> <Title>Sherlock Holmes - I</Title> <Author>Arthur Conan Doyle</Author> <purchase xmlns="http://www.otherlibrary.com"> <Title>Sherlock Holmes - II</Title> <Author>Arthur Conan Doyle</Author> </purchase> <Title>Sherlock Holmes - III</Title> <Author>Arthur Conan Doyle</Author> </Book>

In the above, the elements Book, and Title, and Author of Sherlock Holmes - III and Sherlock Holmes - I are associated with the namespace http://www.library.com and the elements purchase,Title, and Author of Sherlock Holmes - II are associated with the namespacehttp://www.otherlibrary.com.

Default Namespace and Attributes

Default namespaces do not apply to attributes; therefore, to apply a namespace to an attribute the attribute must be explicitly qualified. Here the attribute isbn has {no namespace} whereas the attribute cover is associated with the namespacehttp://www.library.com.

Copy

  <?xml version="1.0"?> <Book isbn="1234" pfx:cover="hard" xmlns="http://www.library.com" xmlns:pfx="http://www.library.com"> <Title>Sherlock Holmes</Title> <Author>Arthur Conan Doyle</Author> </Book>

Undeclaring Namespace

Unbinding an already-bound prefix is not allowed per the W3C Namespaces in XML 1.0 Recommendation, but is allowed per W3C Namespaces in XML 1.1 Recommendation. There was no reason why this should not have been allowed in 1.0, but the mistake has been rectified in 1.1. It is necessary to know this difference because not many XML parsers yet support Namespaces in XML 1.1.

Although there were some differences in unbinding prefixed namespaces, both versions allow you to unbind or remove the already declared {default namespace} by overriding it with another {default namespace} declaration, where the namespace in the overriding declaration is empty. Unbinding a namespace is as good as the namespace not being declared at all. Here the elements Book, Title, and Author of Sherlock Holmes - III and Sherlock Holmes - I are associated with the namespacehttp://www.library.com and the elements purchase, Title, and Author of Sherlock Holmes - II have {no namespace}:

Copy

  <someElement xmlns ="" /> <?xml version="1.0"?> <Book xmlns="http://www.library.com"> <Title>Sherlock Holmes - I</Title> <Author>Arthur Conan Doyle</Author> <purchase xmlns=""> <Title>Sherlock Holmes - II</Title> <Author>Arthur Conan Doyle</Author> </purchase> <Title>Sherlock Holmes - III</Title> <Author>Arthur Conan Doyle</Author> </Book>

Here's an invalid example of unbinding a prefix per Namespaces in XML 1.0 spec, but a valid example per Namespaces in XML 1.1:

From this point on, the prefix lib cannot be used in the XML document because it is now undeclared as long as you are in the scope of element purchase. Of course, you can definitely re-declare it.

No Namespace

No namespace exists when there is no default namespace in scope. A {default namespace} is one that is declared explicitly using xmlns. When a {default namespace} has not been declared at all using xmlns, it is incorrect to say that the elements are in {default namespace}. In such cases, we say that the elements are in {no namespace}. {no namespace} also applies when an already declared {default namespace} is undeclared.

In summary:

The scope of a declared namespace begins at the element where it is declared and applies to all the elements within the content of that element, unless overridden by another namespace declaration with the same prefix name.
Both prefixed and {default namespace} can be overridden.
Both prefixed and {default namespace} can be undeclared.
{default namespace} does not apply to attributes directly.
A {default namespace} exists only when you have declared it explicitly. It is incorrect to use the term {default namespace} when you have not declared it.
No namespace exists when there is no default namespace in scope.

Namespaces and XML Schema

Thus far we have seen how to declare and use an existing namespace. Now let's examine how to create a new namespace and add elements and attributes to it using XML Schema.

XML Schema is an XML before it's anything else. In other words, like any other XML document, XML Schema is built with elements and attributes. This "building material" must come from the namespace http://www.w3.org/2001/XMLSchema, which is a declared and reserved namespace that contains elements and attributes as defined in W3C XML Schema Structures Specification and W3C XML Schema Datatypes Specification . You should not add elements or attributes to this namespace.

Using these building blocks we can create new elements and attributes as required and enforce the required constraints on these elements and attributes an d keep them in some namespace. (See Figure 1 ) XML Schema calls this particular namespace as the {target namespace}, or the namespace where the newly created elements and attributes will reside.

Figure 1: Elements and attributes in XML Schema namespace are used to write an XML Schema document, which generates elements and attributes as defined by user and puts them in {target namespace}. This {target namespace} is then used to validate the XML instance.

This {target namespace} is referred from the XML instance for ensuring validity of the instance document. (See Figure 2 .) During validation, the Validator verifies that the elements/attributes used in the instance exist in the declared namespace, and also checks for any other constraint on their structure and datatype.

Figure 2: From XML Schema to XML Schema instance

Qualified or Unqualified

In XML Schema we can choose to specify whether the instance document must qualify all the elements and attributes, or must qualify only the globally declared elements and attributes. Regardless of what we choose, the entire instance would be validated. So why do we have two choices?

The answer is "manageability." When we choose qualified , we are specifying that all the elements and attributes in the instance must have a namespace, which in turn adds namespace complexity to instance. If say that the schema is modified by making some local declarations global and/or making some global declarations local, then the instance documents are not affected at all. In contrast, when we choose unqualified , we are specifying that only the globally declared elements and attributes in the instance must have a namespace, which in turn hides the namespace complexity from the instance. But in this case, if say, the schema is modified by making some local declarations global and/or making some global declarations local, then all instance documents are affectedand the instance is no longer valid. The XML Schema Validator would report validation errors if we try to validate this instance against the modified XML Schema. Therefore, the namespaces must be fixed in the instance per the modification done in XML Schema to make the instance valid again.

Copy

  <?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://www.library.com" targetNamespace="http://www.library.com" elementFormDefault="qualified" attributeFormDefault="unqualified"> <element name="Book" type="tns:BookType" /> <complexType name="BookType"> <sequence> <element name="Title" type="string" /> <element name="Author" type="string" /> </sequence> </complexType> </schema>

The declarations that are the immediate children of the element <schema> are the global declarations, and the rest are local declarations. In the above example, Book and BookType are declared globally whereas Title and Author are local declarations.

We can express the choice between qualified and unqualified by setting the schema element attributes elementFormDefault and attributeFormDefault to either qualified or unqualified.

Copy

  elementFormDefault = ( qualified | unqualified) : unqualified attributeFormDefault = ( qualified | unqualified) : unqualified

When elementFormDefault is set to qualified , it implies that in the instance of this grammar all the elements must be explicitly qualified, either by using a prefix or setting a {default namespace}. An unqualified setting means that only the globally declared elements must be explicitly qualified, and the locally declared elements must not be qualified. Qualifying a local declaration in this case is an error. Similarly, when attributeFormDefault is set to qualified , all attributes in the instance document must be explicitly qualified using a prefix.

Remember, {default namespace} doesn't apply to attributes; hence, we can't use a {default namespace} declaration to qualify attributes.Unqualified seems to imply being in the namespace by virtue of the containing element. This is interesting, isn't it?

In the following diagrams, the concept symbol space is similar to the non-normative concept of namespace partition. For example, if a namespace is like a refrigerator, then the symbol spaces are the shelves in the refrigerator. Just as shelves partition the entire space in a refrigerator, the symbol spaces partition the namespace.

There are three primary partitions in a namespace: one for global element declarations, one for global attribute declarations, and one for global type declarations (complexType/simpleType). This arrangement implies we can have a global element, a global attribute, and a global type all have the same name, and still co-exist in a {target namespace} without any name collisions. Further, every global element and a global complexType have their own symbol space to contain the local declarations.

Let's examine the four possible combinations of values for the pair of attributes elementFormDefault and attributeFormDefault.

Case 1: elementFormDefault=qualified, attributeFormDefault=qualified

Here the {target namespace} directly contains all the elements and attributes; therefore, in the instance, all the elements and attributes must be qualified.

Case 2: elementFormDefault=qualified, attributeFormDefault=unqualified

Here the {target namespace} directly contains all the elements and the corresponding attributes for these elements are contained in the symbol space of the respective elements. Therefore, in the instance, only the elements must be qualified and the attributes must not be qualified, unless the attribute is declared globally.

Case 3: elementFormDefault=unqualified, attributeFormDefault=qualified

Here the {target namespace} directly contains all the attributes and only the globally declared elements, which in turn contains its child elements in its symbol space. Therefore, in the instance, only the globally declared elements and all the attributes must be qualified.

Case 4: elementFormDefault=unqualified, attributeFormDefault=unqualified

Here the {target namespace} directly contains only the globally declared elements, which in turn contains its child elements in its symbol space. Every element contains the corresponding attributes in its symbol space; therefore, in the instance, only the globally declared elements and attributes must be qualified.

The above diagrams are intended as a visual representation of what is directly contained in a namespace and what is transitively contained in a namespace, depending on the value of elementFormDefault/ attributeFormDefault . The implication of this setting is that the elements/attributes directly in the {target namespace} must have a namespace associated with them in the corresponding XML instance, and the elements/attributes that are not directly (transitively) in the {target namespace} must not have a namespace associated with them in the corresponding XML instance.

Target Namespace and No Target Namespace

Now we know that XML Schema creates the new elements and attributes and puts it in a namespace called {target namespace}. But what if we don't specify a {target namespace} in the schema? When we don't specify the attribute targetNamespace at all, no {target namespace} existswhich is legalbut specifying an empty URI in the targetNamespace attribute is "illegal."

For example, the following is invalid. We can't specify an empty URI for the {target namespace}:

<schema targetNamespace="" . . .>

In this case, when no {target namespace} exists, we say, as described earlier, that the newly created elements and attributes are kept in {no namespace}. (It would have been incorrect to use the term {default namespace}.) To validate the corresponding XML instance, the corresponding XML instance must use the noNamespaceSchemaLocation attribute from the http://www.w3.org/2001/XMLSchema-instance namespace to refer to the XML Schema with no target namespace.

Conclusion

Hopefully, this overview of namespaces should help you move to XML Schema more easily. The Oracle XML Developer Kit (XDK) supports the W3C Namespaces in the XML 1.0 Recommendation; you can turn on/off the namespace check using the JAXP APIs in the Oracle XDK by using the setNamespaceAware(boolean) method in the SAXParserFactory and the DocumentBuilderFactory classes.

Rahul Srivastava (rahuls@apache.org) is a senior member of Oracle Application Server development team at Oracle and is presently working in the EAI space. He has contributed in the development of the Apache open-source Xerces2-J W3C complaint validating XML Parser primarily in the area of W3C XML Schema. Rahul was also a contributor to JAXP and JSR-173 when working with Sun Microsystems as part of the Web services team.

Resources

Use the following resources to test the examples and to learn more about namespaces and XML Schema.

Download Oracle 10g XDK Production . Oracle XDK is a set of components, tools, and utilities that eases the task of building and deploying XML-enabled applications. Unlike many shareware and trial XML components, the production Oracle XDK are fully supported and come with a commercial redistribution license.

Read the W3C XML Schema Primer This document provides an easily readable description of the XML Schema facilities, and is oriented toward quickly understanding how to create schemas using the XML Schema language. XML Schema Part 1: Structuresand XML Schema Part 2: Datatypes provide the complete normative description of the XML Schema language.

Bookmark the XML Technology Center Whether you are a beginner, intermediate, or advanced XML user, the XML Center provides you up-to-date content and guidance to develop all types of XML and Web Service applications.

XML Samples and Tutorials

XML Sample Code
Tutorial Series: Oracle XML Parser Techniques