Looking at XMP

I’ve been taking a look at XMP as I’ve been considering different ways to “enrich” content. Embedding metadata is one option and XMP aims to fulfill the role of a metadata format suitable for embedding in a diverse range of media formats.
It’s also under discussion as way to embed metadata in the OpenDocument format. The alternatives available in that quarter have been under discussion in various circles for some time. Bruce D’Arcus points to the latest entry to that discussion in his recent “OpenDocument and XMP” posting.
I thought I’d write up some notes on XMP in general and contribute some thoughts towards that debate. This is the first of two postings on this topic.


After speed reading Bob DuCharme’s XMP Lowdown article to get myself oriented, my first port of call were the Adobe XMP developer resources: I wanted to get my hands dirty working with the technology and needed some tools. After sifting through the site and the forums all I could find as the C++ toolkit; not much use to me as a Java developer. Extending my search to Google the best I could find was this regex(!) for extracting XMP documents.
That wasn’t a promising start. I know that XMP has been around for a number of years and I’d expect there to be more tool support from Adobe. Or failing that from the broader development community. I gather than XMP is well supported in Adobe’s own products and that a number of other vendors (e.g. of content management and asset tracking products) but it certainly hasn’t garnered much interest open source circles.


Turning to the specification I was encouraged to find that XMP is based around RDF. It’s an RDF profile of sorts, although it opts for some rather quirky restrictions on the allowed RDF/XML syntax. Syntactic profiles of RDF don’t scare (or surprise) me, but this one left me with raised eye-brows. Rather than constraining the syntax to a fixed XML format, one that could be validated against an XML schema but still retain an RDF interpretation, the restrictions are placed elsewhere.
For example, in XMP one isn’t allowed to use typed nodes and all children of the rdf:RDF element must be rdf:Description elements. Fair enough. But the specification states that a single Description element can only contain properties from a single namespace. So if you’re mixing, say, XMP properties with Dublin Core and PRISM, then you’re forced to use one rdf:Description element per namespace. I can’t see the advantages here as an application could simply ignore what it didn’t understand.
XMP, despite having been revised in June 2005, also seems to be based on much earlier versions of RDF. The specification references to RDF features that were removed in 2001. It also requires the use of the rdf:RDF element which is now an optional part of the syntax.
I was more concerned about recommendations about how some metadata should be encoded. Particularly the use of rdf:Bag, rdf:Alt, and rdf:Seq. Current best practice (if thats not too strong a term) is to use simply use repeated properties for many of the cases that the XMP specification discusses; alternate languages for articles titles for example. It simplifies both the syntax and working with the data in an RDF application.
XMP requires the use of an rdf:Seq in order to express multiple authors of a document. In other words it recommends using dc:creator as if it were defined to be a sequence rather than a simple literal value. Working with bibliographic metadata I understand the need to define ordering amongst authors, but not at the cost of deepening the confusion over using dc:creator.
The XMP Lowdown article describes how perfectly valid RDF data is forced into this particular model to the extent that its not correctly round tripped via XMP tools. Implying an ordering where one hasn’t been stated originally seems like a bad idea to me.
So really XMP isn’t a profile of RDF: its a separate data model that happens to use RDF/XML as a serialization mechanism because its a close fit.
I think there are some benefits being lost here. It wouldn’t take much to bring the XMP and RDF models closer together, and still gain the benefits of both predictable structures for applications and the RDF model itself.


I was also surprised to discover that XMP is also a profile of XML.
An XMP document cannot include an XML declaration. This alters the definition of well-formedness from the XML specification. In the section on how to embed XMP within SVG, there’s this note:

An XMP Packet is not intended to be a complete standalone XML document; therefore it contains no XML declaration.

Without an XML declaration there’s no way to declare the encoding of the XMP document. The XMP equivalent of the XML declaration does including an encoding attribute, but its deprecated, meaning that to my understanding one has to determine the specific encoding via other means.
Perhaps this is consequence of defining a format for embedding in binary documents, but it certainly seems like an odd decision. I always hear alarm bells when I see XML being redefined in this way.
XMP also requires a value of x-default for the “default” language when defining alternatives. While this is permissible in XML and RFC 3066, its hardly portable. Default in what (and whose) context?
The lack of a formal schema for XMP (of any variety) also seems a huge oversight.

The Goals of XMP

Disagreements over technical minutae aside, I do see some real value in XMP. The ability to embed metadata in arbitrary binary document formats is a huge benefit. This is the real core of XMP and its primary use case.
Avoiding having to package up content and metadata makes many application much simpler, especially as there’s no formal XML packaging specification.
For formats that are already XML, and/or already have well-defined packaging mechanisms, I’m not clear on the immediate benefits of XMP. It’s quirks from both an XML and RDF perspective, and its lack of tool support, make it a less than ideal choice IMO.
More on XMP and OpenDocument to follow.