OpenDocument and XMP

This is the second part of my look at XMP. This time I’m focusing on the potential for using XMP as the metadata format for OpenDocument (ODF).
This is part of a broader discussion to help define the future direction for the ODF metadata format, one proposal on the table is to use RDF, via a constrained RDF/XML syntax. There’s a wiki available for discussing this issue, particularly how to map existing metadata to RDF.
At least some of the impetus for exploring richer metadata support has come from the bibliographic sub-project which aims to build-in support for bibliography management into OpenOffice 3.0.
RDF is a good fit for the flexible storage and formatting requirements that arise from bibliographic metadata. As XMP is an RDF profile its worthy of consideration, and in fact this is the proposal behind Alan Lilich’s posting to the OpenDocument TC member list. Lilich’s discussion document frames the rest of this posting.

Measuring Fit

Lilich’s presumptions and bias are all very pragmatic, Here’s something that I particularly agree with:
The completeness and quality of all applications, commercial or open source, depends quite a bit on the clarity and implementability of the OpenDocument specification. It needs to be easily, reliably, and consistently implemented.
This is probably the crux of the debate with respect to deciding whether to “import” a third-party specification as a portion of the ODF specification. In other words, does XMP meet all of these goals?
While Adobe does have a C++ toolkit, there aren’t any other open implementations of the specification. The availability of several, independent, conformant implementations of a specification is a pre-requisite for the safe adoption of any technology. In my view, and irrespective of any of its other merits, XMP immediately fails on this point.
As to the “easily and reliably” aspects to be able to measure those one needs to gather implementation experience from developers, ideally backed with a conformance suite against which those implementations can be compared. Again XMP doesn’t, yet, measure up here, despite the specification having been available for several years.
To remedy this Adobe need to invest some additional effort in shepherding the specification to encourage wider comment and implementation. If they are unwilling to pursue this themselves, then moving the maintenance of the specification to a more open forum, e.g. an OASIS Technical Committee, would allow the community to organize itself better. This model is working for OpenDocument, and I don’t see why it couldn’t work for XMP.
My understanding is that there are existing concerns amongst vendors about consistent levels of XMP conformance and the lack of a formal schema for validating XMP documents. These concerns may ultimately result in a community led initiative to shepherd XMP. Either way the OpenDocument TC would do well to investigate further the experiences of vendors using XMP to highlight possible pitfalls before adopting the technology wholesale.
Lilich’s suggestion that:
…the OpenDocument metadata effort could succeed by starting with XMP, understanding how to work within XMP, and only looking for truly necessary changes.
Seems backward to me: the real utility of XMP ought to be demonstrated first before the OpenDocument TC look at re-framing their goals to work within whatever process Adobe are laying out for the future direction of XMP. Suggesting that the TC only look for “truly necessary changes” runs counter to the need to “get it right”. XMP ought to be shown to be unequivocally better than the alternatives before making any technical compromises. A clear understanding of the trade-offs is necessary here.
The section of Lilich’s post entitled “Latitude for change” does a good job of setting how much room for compromise there is really available. The reasoning here is entirely understandable from Adobe’s perspective, but should be of concern to any vendor uncomfortable with technical aspects of the format.
In my previous posting I suggested that the real benefit of XMP is in its definition of how to embed metadata within a number of arbitrary binary formats. I stand by that. Within the evaluation of its using within the OpenDocument format, I believe its key value is as a demonstration that an RDF-based metadata model is achievable.

Validation and Conformance

In my previous posting I raised some concerns over the interesting ways that XMP subsets RDF. It’s clear that XMP subsets both the RDF model and the RDF/XML syntax.
The cleanest way to create an easibly processable (i.e. by both RDF and XML toolkits) RDF subset is to apply constraints at the syntax level. This has several benefits.
Firstly, by making the format validatable with an XML schema benefits conformance and allows developers to use tools like XSLT to perform basic manipulations of the data. RELAX NG seems to be the best fit for a schema language here.
A fixed XML serialization is also important in order to maintain the hackability of the OpenDocument format, without necessarily having to compromise on the richness of the metadata model.
Secondly, XML schemas can provide the hooks to enable “semantic anchors” that can help application authors tame some of the wildness of the RDF model.
Constraining syntax is the general intent behind the “Plain XMP” format that Lilich describes at the end of his posting. I think Adobe are missing an opportunity here: they can constrain the RDF/XML syntax without having to produce an alternate serialization. It would be interesting to know whether this approach has already been rejected.
To support the XML (RELAX NG) schemas, XMP ought to include equivalent RDF schemas for all of its additional properties, plus those for extension schemas where there aren’t already public equivalents.
And, where the specification does make use of Classes and Properties from existing public schemas, it should respect the definitions in those schemas. XMP clearly doesn’t do this for the Dublin Core properties such as dc:creator and dc:subject, where it requires using of RDF collections instead of repeated attributes. This does nothing to help interoperability.
Lilich suggests that when using repeated attributes “[c]lient application code becomes more complex and UI design more difficult if everything is potentially an array”. I’m not convinced that the complexities are really that great, certainly not from an API design perspective. Schema cues can help address these concerns.
If there really are significant issues with using simple repeated literal properties then XMP, or the OpenDocument TC, should define new properties, or suggest that the relevant community extend existing schemas. One of the power of RDF is that this kind of schema evolution can happen in a distributed fashion.

Markup Escaping

I was shocked to see the suggestion that escaped markup in XML elements is an acceptable solution. OpenDocument should not recommend any format that uses markup escaping. To do so would undermine the whole benefit of the OpenDocument format being expressed as XML.
I’ll just point to Norm Walsh’s essay “Escaped Markup: Still Harmful” for further comment on that anti-pattern.

Qualifiers in XMP

XMP allows “qualifiers” on attributesl; essentially these are “properties of properties”. In an RDF context the property is simply a Resource which can then be annotated with multiple properties. In Lilich’s example, an author name may be “annotated” with the location of the authors blog.
This is one area where XMP would do well to align itself more closely with RDF, and explicitly model the relevant properties as Resources from the outset.
While the qualifiers may be transparent and easy to use from an XMP context, they add further confusion to the RDF export: without a qualifier a dc:creator property may be a simply Literal, but add a qualifier and its becomes a Resource.
This leads to precisely the problems described here. In fact it has the DC folks in a bit of a crisis.
As I’ve already mentioned that XMP’s use of RDF collections compounds this problem.

Conclusions

To sum up, my personal opinion on this is that XMP is not a good fit with the OpenDocument format. There are reasons to explore reliable conversions to and from XMP, but I don’t see enough compelling reasons to adopt XMP in its current form.
The latitude for change to the XMP format itself seems very small, so opportunities for adopting the format now, and working on improving it later also seem slim.
XMP is a reasonably good examplar of an RDF (or near RDF) based model for document metadata, that can be used by both XML and RDF tools. But the technical issues outlined above limit its general utility. It’s disregard for already published schemas (Dublin Core) and best practices (e.g. markup escaping) are of a concern.
Personally I’d be interested to see an open specification that built on the XMP experiences to separate out its different aspects (model, syntax, and format embedding) with a view to encouraging wider implementation and conformance.
From the OpenDocument perspective I think there is definite value in exploring an RDF interpretation of its metadata. Just as XMP does, this model can build on existing schemas such as Dublin Core, and possibly PRISM to avoid having to create a whole new set of schemas.