XML Catalogs and Namespace Documents

Whilst writing a tutorial on XML Catalogs I started thinking about dereferencing namespace URIs again. Yes, I’m a sucker for punishment. What follows is some thoughts about using XML Catalogs in conjunction with namespace dereferencing…

Let’s take a moment for a brief recap. As a result of a lot of discussion on XML-DEV we have RDDL which can be used to describe “What’s at the end of a namespace URI”. Personally, I like RDDL as it nicely meets its goal of being both human and machine-readable.
The TAG have been considering this same question and have included wording in their Architecture of the World Wide Web document to note that

Namespace designers SHOULD make available human-readable material to meet the needs of those who will use the namespace vocabulary…

and also that…

[a] namespace document should also support the automatic retrieval of other Web resources that support the processing markup from this vocabulary.

Precisely what a “namespace document” looks like is an open issue and there have been a number of different proposals. Some based on RDF, some not. So I expect that eventually there will be a new Technical Recommendation from the W3C defining the format of a namespace document. Then what?
Well, you’ll be able to have your application dereference namespace URIs, obtain the namespace document and use it as a resource discovery tool. It could attempt to find stylesheets that would format the document nicely as HTML, or convert the vocabulary into another, e.g. a normative transformation into RDF for non-RDF vocabularies. The application may also try to find a suitable schema for validating the document. There are a lot of interesting possibilities.
However, there are a couple of issues about this approach that are worth considering, mostly ones concerning control. Who can update the namespace document? The short answer to that is whoever controls the web server on which it resides, and this is likely to be the creator/owner of the namespace. Secondly, do you trust that authority? If you application relies on a schema or transform that it obtains via the namespace document then what if that gets changed or removed? Instant breakage. What if the URL isn’t available for a period? Ooops!
These are all problems that XML Catalogs are designed to solve. By allowing a URI obtained from an XML document to be mapped to another URI reference (e.g. to a locally mirrored copy) you can maintain local control over the resources, ensure that the application is always using the correct version, substitute your own fancy transforms, etc, etc. One way to configure this might be to create a catalog that maps the namespace URI to a locally held copy of the namespace document. This catalog would also include mappings for any resources referenced from that document. It would also allow you to locally annotate that namespace document to include additional links to resources of your own creation.
But this approach is dependent on there being some definitive format for a namespace document that you can teach your application to process. Why wait for the TAG to finish? Here’s an alternative which you can use right now:
Create several XML catalogs, each catalog pointing to a different kind of resource. If you application wants to be able to dereference a URI to fetch both an RDF Schema and an XSLT transformation then create two catalogs. The first will map the URI to a locally held copy of the schema. The second will map the same URI to the stylesheet. The question then becomes one of when to use which catalog? This is easily answered as you’re going to be using these resources in different contexts in your application. Simply use a suitably configured URIResolver in each context. You can obtain a catalog aware implementation from the xml-commons project. In fact in all likelihood your application may be only interested in one type of resource (but different applications may be interested in different types). This is an even simpler arrangement as each application can just read the desired catalog.
In this way you can dereference known namespace URIs and retrieve exactly the resource you require. That resource doesn’t even have to be held locally, you can effectively hook up any namespace URI to any resource that you like and dereference it to your hearts content.
The obvious short-coming with this technique is that it isn’t dynamic, i.e. you have to know the useful namespace identifiers in advance. But, you’re no worse off than the current situation in which there may or may not be something at the Namespace URI, and that “something” may or may not be what you expect or need. But you’ve at least got a configuration that would allow you to easily add new resources and namespaces as you encounter them. So to me this isn’t a big problem.
You could even create a “Caching URIResolver” which does attempt to dereference the namespace URI and attempt to process what it finds there. It could then save a local copy of that document or any other useful resources it discovers linked from it, and then update the local catalog automatically. Before returning locally cached copies of resources it could check for newer versions, if thats the behaviour you require. The XML Catalog format is extensible with elements from other namespaces so it’d be possible to add store some caching information in there too.
I’d be interested to hear if anyone is already using something like this. Or whether it breaks all sorts of unwritten rules!