XML Schema: Understanding Namespaces
by Rahul Srivastava
Moving to XML Schema? This introduction to namespaces will help you understand one of its more important components.
Other articles in this series: XML Schema: Understanding Datatypes XML Schema: Understanding Structures
As defined by the W3C Namespaces in XML Recommendation, an XML namespace is a collection of XML elements and attributes identified by an Internationalized Resource Identifier (IRI); this collection is often referred to as an XML "vocabulary." One of the primary motivations for defining an XML namespace is to avoid naming conflicts when using and re-using multiple vocabularies. XML Schema is used to create a vocabulary for an XML instance, and uses namespaces heavily. Thus, having a sound grasp of the namespace concept is essential for understanding XML Schema and instance validation overall.
Namespaces are similar to packages in Java in several ways:
- A package in Java can have many reusable classes and interfaces. Similarly, a namespace in XML can have many reusable elements and attributes.
- To use a class or interface in a package, you must fully qualify that class or interface with the package name. Similarly, to use an element or attribute in a namespace, you must fully qualify that element or attribute with the namespace.
- A Java package may have an inner class that is not directly inside the package, but rather "belongs" to it by the virtue of its enclosing class. The same is true for namespaces: there could be elements or attributes that are not directly in a namespace, but belongs to the namespace by virtue of its parent or enclosing element. This is a transitive relationship. If a book is on the table, and the table is on the floor, then transitively, the book is on the floor; albeit the book is not directly on the floor.
Thus, we see that the namespaces in XML concept is not very different from packages in Java. This correlation is intended to simplify the understanding of namespaces in XML and to help you visualize the namespaces concept.
In this article, you will learn:
- The role of namespaces in XML
- How to declare and use namespaces
- The difference between default-namespace and no-namespace
- How to create namespaces using XML Schema, and
- The difference between qualified and unqualified elements/attributes in a namespace.
Declaring and Applying Namespaces
Namespaces are declared as an attribute of an element. It is not mandatory to declare namespaces only at the root element; rather it could be declared at any element in the XML document. The scope of a declared namespace begins at the element where it is declared and applies to the entire content of that element, unless overridden by another namespace declaration with the same prefix name—where, the content of an element is the content between the <opening-tag> and </closing-tag> of that element. A namespace is declared as follows:
<someElement xmlns:pfx="http://www.foo.com" />
In the attribute xmlns:pfx, xmlns is like a reserved word, which is used only to declare a namespace. In other words, xmlns is used for binding namespaces, and is not itself bound to any namespace. Therefore, the above example is read as binding the prefix "pfx" with the namespace "http://www.foo.com."
It is a convention to use XSD or XS as a prefix for the XML Schema namespace, but that decision is purely personal. One can choose to use a prefix ABC for the XML Schema namespace, which is legal, but doesn't make much sense. Using meaningful namespace prefixes add clarity to the XML document. Note that the prefixes are used only as a placeholder and must be expanded by the namespace-aware XML parser to use the actual namespace bound to the prefix. In Java analogy, a namespace binding can be correlated to declaring a variable, and wherever the variable is referenced, it is replaced by the value it was assigned.
In our previous namespace declaration example, wherever the prefix "pfx" is referenced within the namespace declaration scope, it is expanded to the actual namespace (http://www.foo.com) to which it was bound:
In Java: String pfx = "http://www.library.com"
In XML: <someElement xmlns:pfx="http://www.foo.com" />
Although a namespace usually looks like a URL, that doesn't mean that one must be connected to the Internet to actually declare and use namespaces. Rather, the namespace is intended to serve as a virtual "container" for vocabulary and un-displayed content that can be shared in the Internet space. In the Internet space URLs are unique—hence you would usually choose to use URLs to uniquely identify namespaces. Typing the namespace URL in a browser doesn't mean it would show all the elements and attributes in that namespace; it's just a concept.
But here's a twist: although the W3C Namespaces in XML Recommendation declares that the namespace name should be an IRI, it enforces no such constraint. Therefore, I could also use something like:
<someElement xmlns:pfx="foo" />
which is perfectly legal.
By now it should be clear that to use a namespace, we first bind it with a prefix and then use that prefix wherever required. But why can't we use the namespaces to qualify the elements or attributes from the start? First, because namespaces—being IRIs—are quite long and thus would hopelessly clutter the XML document. Second and most important, because it might have a severe impact on the syntax, or to be specific, on the production rules of XML—the reason being that an IRI might have characters that are not allowed in XML tags per the W3C XML 1.0 Recommendation.
Invalid) <http://www.library.com:Book />
Valid) <lib:Book xmlns:lib="http://www.library.com" />
Below the elements Title and Author are associated with the Namespace http://www.library.com:
<?xml version="1.0"?>
<Book xmlns:lib="http://www.library.com">
<lib:Title>Sherlock Holmes</lib:Title>
<lib:Author>Arthur Conan Doyle</lib:Author>
</Book>
In the example below, the elements Title and Author of Sherlock Holmes - IIIand Sherlock Holmes - I are associated with the namespace http://www.library.com and the elements Title and Author of Sherlock Holmes - II are associated with the namespace http://www.otherlibrary.com.
<?xml version="1.0"?>
<Book xmlns:lib="http://www.library.com">
<lib:Title>Sherlock Holmes - I</lib:Title>
<lib:Author>Arthur Conan Doyle</lib:Author>
<purchase xmlns:lib="http://www.otherlibrary.com">
<lib:Title>Sherlock Holmes - II</lib:Title>
<lib:Author>Arthur Conan Doyle</lib:Author>
</purchase>
<lib:Title>Sherlock Holmes - III</lib:Title>
<lib:Author>Arthur Conan Doyle</lib:Author>
</Book>
The W3C Namespaces in XML Recommendation enforces some namespace constraints:
- Prefixes beginning with the three-letter sequence x, m, and l, in any case combination, are reserved for use by XML and XML-related specifications. Although not a fatal error, it is inadvisable to bind such prefixes. The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace.
- A prefix cannot be used unless it is declared and bound to a namespace. (Ever tried to use a variable in Java without declaring it?)
The following violates both these constraints:
<?xml version="1.0"?>
<Book xmlns:XmlLibrary="http://www.library.com">
<lib:Title>Sherlock Holmes - I</lib:Title>
<lib:Author>Arthur Conan Doyle</lib:Author>
</Book>
[Error]: prefix lib not bound to a namespace.
[Inadvisable]: prefix XmlLibrary begins with 'Xml.'
Default Namespace (Not Default Namespaces)
It would be painful to repeatedly qualify an element or attribute you wish to use from a namespace. In such cases, you can declare a {default namespace} instead. Remember, at any point in time, there can be only one {default namespace} in existence. Therefore, the term "Default Namespaces" is inherently incorrect.
Declaring a {default namespace} means that any element within the scope of the {default namespace} declaration will be qualified implicitly, if it is not already qualified explicitly using a prefix. As with prefixed namespaces, a {default namespace} can be overridden too. A {default namespace} is declared as follows:
<someElement xmlns="http://www.foo.com" />
<?xml version="1.0"?>
<Book xmlns="http://www.library.com">
<Title>Sherlock Holmes</Title>
<Author>Arthur Conan Doyle</Author>
</Book>
In this case the elements Book, Title, and Author are associated with the Namespace http://www.library.com.
Remember, the scope of a namespace begins at the element where it is declared. Therefore, the element Book is also associated with the {default namespace}, as it has no prefix.
<?xml version="1.0"?>
<Book xmlns="http://www.library.com">
<Title>Sherlock Holmes - I</Title>
<Author>Arthur Conan Doyle</Author>
<purchase xmlns="http://www.otherlibrary.com">
<Title>Sherlock Holmes - II</Title>
<Author>Arthur Conan Doyle</Author>
</purchase>
<Title>Sherlock Holmes - III</Title>
<Author>Arthur Conan Doyle</Author>
</Book>
In the above, the elements Book, and Title, and Author of Sherlock Holmes - III and Sherlock Holmes - I are associated with the namespace http://www.library.com and the elements purchase, Title, and Author of Sherlock Holmes - II are associated with the namespace http://www.otherlibrary.com.
Default Namespace and Attributes
Default namespaces do not apply to attributes; therefore, to apply a namespace to an attribute the attribute must be explicitly qualified. Here the attribute isbn has {no namespace} whereas the attribute cover is associated with the namespace http://www.library.com.
<?xml version="1.0"?>
<Book isbn="1234"
pfx:cover="hard"
xmlns="http://www.library.com"
xmlns:pfx="http://www.library.com">
<Title>Sherlock Holmes</Title>
<Author>Arthur Conan Doyle</Author>
</Book>
Undeclaring Namespace
Unbinding an already-bound prefix is not allowed per the W3C Namespaces in XML 1.0 Recommendation, but is allowed per W3C Namespaces in XML 1.1 Recommendation. There was no reason why this should not have been allowed in 1.0, but the mistake has been rectified in 1.1. It is necessary to know this difference because not many XML parsers yet support Namespaces in XML 1.1.
Although there were some differences in unbinding prefixed namespaces, both versions allow you to unbind or remove the already declared {default namespace} by overriding it with another {default namespace} declaration, where the namespace in the overriding declaration is empty. Unbinding a namespace is as good as the namespace not being declared at all. Here the elements Book, Title, and Author of Sherlock Holmes - III and Sherlock Holmes - I are associated with the namespace http://www.library.com and the elements purchase, Title, and Author of Sherlock Holmes - II have {no namespace}:
<someElement xmlns="" />
<?xml version="1.0"?>
<Book xmlns="http://www.library.com">
<Title>Sherlock Holmes - I</Title>
<Author>Arthur Conan Doyle</Author>
<purchase xmlns="">
<Title>Sherlock Holmes - II</Title>
<Author>Arthur Conan Doyle</Author>
</purchase>
<Title>Sherlock Holmes - III</Title>
<Author>Arthur Conan Doyle</Author>
</Book>
Here's an invalid example of unbinding a prefix per Namespaces in XML 1.0 spec, but a valid example per Namespaces in XML 1.1:
<purchase xmlns:lib="">
From this point on, the prefix lib cannot be used in the XML document because it is now undeclared as long as you are in the scope of element purchase. Of course, you can definitely re-declare it.
No Namespace
No namespace exists when there is no default namespace in scope. A {default namespace} is one that is declared explicitly using xmlns. When a {default namespace} has not been declared at all using xmlns, it is incorrect to say that the elements are in {default namespace}. In such cases, we say that the elements are in {no namespace}. {no namespace} also applies when an already declared {default namespace} is undeclared.
In summary:
- The scope of a declared namespace begins at the element where it is declared and applies to all the elements within the content of that element, unless overridden by another namespace declaration with the same prefix name.
- Both prefixed and {default namespace} can be overridden.
- Both prefixed and {default namespace} can be undeclared.
- {default namespace} does not apply to attributes directly.
- A {default namespace} exists only when you have declared it explicitly. It is incorrect to use the term {default namespace} when you have not declared it.
- No namespace exists when there is no default namespace in scope.
Namespaces and XML Schema
Thus far we have seen how to declare and use an existing namespace. Now let's examine how to create a new namespace and add elements and attributes to it using XML Schema.
XML Schema is an XML before it's anything else. In other words, like any other XML document, XML Schema is built with elements and attributes. This "building material" must come from the namespace http://www.w3.org/2001/XMLSchema, which is a declared and reserved namespace that contains elements and attributes as defined in W3C XML Schema Structures Specification and W3C XML Schema Datatypes Specification. You should not add elements or attributes to this namespace.
Using these building blocks we can create new elements and attributes as required and enforce the required constraints on these elements and attributes and keep them in some namespace. (See Figure 1.) XML Schema calls this particular namespace as the {target namespace}, or the namespace where the newly created elements and attributes will reside.
|
Figure 1: Elements and attributes in XML Schema namespace are used to write an XML Schema document, which generates elements and attributes as defined by user and puts them in {target namespace}. This {target namespace} is then used to validate the XML instance. |
This {target namespace} is referred from the XML instance for ensuring validity of the instance document. (See Figure 2.) During validation, the Validator verifies that the elements/attributes used in the instance exist in the declared namespace, and also checks for any other constraint on their structure and datatype.
|
Figure 2: From XML Schema to XML Schema instance |
Qualified or Unqualified
In XML Schema we can choose to specify whether the instance document must qualify all the elements and attributes, or must qualify only the globally declared elements and attributes. Regardless of what we choose, the entire instance would be validated. So why do we have two choices?
The answer is "manageability." When we choose qualified, we are specifying that all the elements and attributes in the instance must have a namespace, which in turn adds namespace complexity to instance. If say that the schema is modified by making some local declarations global and/or making some global declarations local, then the instance documents are not affected at all. In contrast, when we choose unqualified, we are specifying that only the globally declared elements and attributes in the instance must have a namespace, which in turn hides the namespace complexity from the instance. But in this case, if say, the schema is modified by making some local declarations global and/or making some global declarations local, then all instance documents are affected—and the instance is no longer valid. The XML Schema Validator would report validation errors if we try to validate this instance against the modified XML Schema. Therefore, the namespaces must be fixed in the instance per the modification done in XML Schema to make the instance valid again.
<?xml version="1.0" encoding="US-ASCII"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:tns="http://www.library.com"
targetNamespace="http://www.library.com"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<element name="Book" type="tns:BookType" />
<complexType name="BookType">
<sequence>
<element name="Title" type="string" />
<element name="Author" type="string" />
</sequence>
</complexType>
</schema>
The declarations that are the immediate children of the element <schema> are the global declarations, and the rest are local declarations. In the above example, Book and BookType are declared globally whereas Title and Author are local declarations.
We can express the choice between qualified and unqualified by setting the schema element attributes elementFormDefault and attributeFormDefault to either qualified or unqualified.
elementFormDefault = (qualified | unqualified) : unqualified
attributeFormDefault = (qualified | unqualified) : unqualified
When elementFormDefault is set to qualified, it implies that in the instance of this grammar all the elements must be explicitly qualified, either by using a prefix or setting a {default namespace}. An unqualified setting means that only the globally declared elements must be explicitly qualified, and the locally declared elements must not be qualified. Qualifying a local declaration in this case is an error. Similarly, when attributeFormDefault is set to qualified, all attributes in the instance document must be explicitly qualified using a prefix.
Remember, {default namespace} doesn't apply to attributes; hence, we can't use a {default namespace} declaration to qualify attributes. Unqualified seems to imply being in the namespace by virtue of the containing element. This is interesting, isn't it?
In the following diagrams, the concept symbol space is similar to the non-normative concept of namespace partition. For example, if a namespace is like a refrigerator, then the symbol spaces are the shelves in the refrigerator. Just as shelves partition the entire space in a refrigerator, the symbol spaces partition the namespace.
There are three primary partitions in a namespace: one for global element declarations, one for global attribute declarations, and one for global type declarations (complexType/simpleType). This arrangement implies we can have a global element, a global attribute, and a global type all have the same name, and still co-exist in a {target namespace} without any name collisions. Further, every global element and a global complexType have their own symbol space to contain the local declarations.
Let's examine the four possible combinations of values for the pair of attributes elementFormDefault and attributeFormDefault.
Case 1: elementFormDefault=qualified, attributeFormDefault=qualified |
Here the {target namespace} directly contains all the elements and attributes; therefore, in the instance, all the elements and attributes must be qualified. |
Case 2: elementFormDefault=qualified, attributeFormDefault=unqualified |
Here the {target namespace} directly contains all the elements and the corresponding attributes for these elements are contained in the symbol space of the respective elements. Therefore, in the instance, only the elements must be qualified and the attributes must not be qualified, unless the attribute is declared globally. |
Case 3: elementFormDefault=unqualified, attributeFormDefault=qualified |
Here the {target namespace} directly contains all the attributes and only the globally declared elements, which in turn contains its child elements in its symbol space. Therefore, in the instance, only the globally declared elements and all the attributes must be qualified. |
Case 4: elementFormDefault=unqualified, attributeFormDefault=unqualified |
Here the {target namespace} directly contains only the globally declared elements, which in turn contains its child elements in its symbol space. Every element contains the corresponding attributes in its symbol space; therefore, in the instance, only the globally declared elements and attributes must be qualified. |
The above diagrams are intended as a visual representation of what is directly contained in a namespace and what is transitively contained in a namespace, depending on the value of elementFormDefault/attributeFormDefault. The implication of this setting is that the elements/attributes directly in the {target namespace} must have a namespace associated with them in the corresponding XML instance, and the elements/attributes that are not directly (transitively) in the {target namespace} must not have a namespace associated with them in the corresponding XML instance.
Target Namespace and No Target Namespace
Now we know that XML Schema creates the new elements and attributes and puts it in a namespace called {target namespace}. But what if we don't specify a {target namespace} in the schema? When we don't specify the attribute targetNamespace at all, no {target namespace} exists—which is legal—but specifying an empty URI in the targetNamespace attribute is "illegal."
For example, the following is invalid. We can't specify an empty URI for the {target namespace}:
<schema targetNamespace="" . . .>
In this case, when no {target namespace} exists, we say, as described earlier, that the newly created elements and attributes are kept in {no namespace}. (It would have been incorrect to use the term {default namespace}.) To validate the corresponding XML instance, the corresponding XML instance must use the noNamespaceSchemaLocation attribute from the http://www.w3.org/2001/XMLSchema-instance namespace to refer to the XML Schema with no target namespace.
Conclusion
Hopefully, this overview of namespaces should help you move to XML Schema more easily. The Oracle XML Developer Kit (XDK) supports the W3C Namespaces in the XML 1.0 Recommendation; you can turn on/off the namespace check using the JAXP APIs in the Oracle XDK by using the setNamespaceAware(boolean) method in the SAXParserFactory and the DocumentBuilderFactory classes.