Everyone loves to hate RDF/XML. Indeed many have argued that RDF/XML is responsible for holding back semantic web adoption. I’m not sure that I fully agree with that (there’s a lot of other issues to consider) but its certainly awkward to work with if you’re trying to integrate both RDF and XML tools into your application.
It’s actually that combination that causes the awkwardness. If you’re just using RDF tools then RDF/XML is mostly fine. It benefits from XML’s Unicode support and is the most widely supported RDF serialisation. There are downsides though. For example there are some potential RDF graphs can’t be serialised as RDF/XML. But that is easy to avoid.
Developers, particularly XML developers, feel cheated by RDF/XML because of what they see as false advertising: its an XML format that doesn’t play nicely with XML tools. Some time ago, Dan Brickley wrote a nice history on the design of RDF/XML which is worth a read for some background. My goal here isn’t to rehash the RDF/XML discussion or even to mount a defense of RDF/XML as a good format for RDF (I prefer Turtle).
But developers are still struggling with RDF/XML, particularly in publishing workflows where XML is a good base representation for document structures, so I think its worthwhile capturing some advice on how to reach a compromise with RDF/XML that allows it to work nicely with XML tools. I can’t remember seeing anyone do that before, so I thought I’d write down some of my experiences. These are drawn from creating a publishing platform that ingested metadata and content in XML, used Apache Jena for storing that metadata, and Solr as a search engine. Integration between different components was carried out using XML based messaging. So there were several places where RDF and XML rubbed up against one another.
Tip 1: Don’t rely on default serialisations
The first thing to note is that RDF/XML offers a lot of flexibility in terms of how an RDF graph can be serialised as XML. A lot. The same graph can be serialised in many different ways using a lot of syntactic short-cuts. More on those in a moment.
It’s this unbounded flexibility that is the major source of the problems: producers and consumers may have reasonable default assumptions about how data will be made published that are completely at odds with one another. This makes it very difficult to consume arbitrary RDF/XML with anything other than RDF tools.
JSON-LD offers a lot of flexibility too, and I can’t help but wonder whether that flexibility is going to come back and bite us in the future.
By default RDF tools tend to generate RDF/XML in a form that makes it easy for them to serialise. This tends to mean automatically generated namespace prefixes and a per-triple approach to serialising the graph, e.g:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:p0="http://www.w3.org/2000/01/rdf-schema#" xmlns:p1="http://xmlns.com/foaf/0.1/"> <rdf:Description rdf:about="http://example.org/person/joe"> <p0:label>Joe Bloggs</po:label> </rdf:Description> <rdf:Description rdf:about="http://example.org/person/joe"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/> </rdf:Description> <rdf:Description rdf:about="http://example.org/person/joe"> <p1:homepage rdf:resource="http://example.org/blogs/joe"/> </rdf:Description> </rdf:RDF>
This is a disaster for XML tools as the description of the resource is spread across multiple elements making it hard to process. But its efficient to generate.
Some RDF frameworks may provide options for customising the output to apply some of the RDF/XML syntactic shortcuts. As we’ll see in a moment these are worth embracing and may produce some useful regularity.
But if you need to generate an XML format that has, for example, a precise ordering of child elements then you’re not going to get that kind of flexibility by default. You’ll need to craft a custom serialiser. Apache Jena allows you to use create RDF Writers to support this kind of customization. This isn’t ideal as you need to write code — even to tweak the output options — but it gives you more control.
So, if you need to generate an XML format from RDF sources then ensure that you normalize your output. If you have control over the XML document formats and can live with some flexibility in the content model, then using RDF/XML syntax shortcuts offered by your RDF tools might well be sufficient. However if you’re working to a more rigid format, then you’re likely to need some custom code.
Tip 2: Use all of the shortcuts
Lets look at the above example again but with a heavy dusting of syntax sugar:
<foaf:Person xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/" rdf:about="http://example.org/person/1"> <rdfs:label>Joe Bloggs</rdfs:label> <foaf:homepage rdf:resource="http://example.org/blogs/joe"/> </foaf:Person>
Much nicer! The above describes exactly the same RDF graph as we had before. What have we done here:
- We’ve omitted the rdf:RDF element as its unnecessary. If you have a single “root” resource in your graph then you can just this as the document element. If we had multiple, unrelated Person resources in the document then we’d need to re-introduce the rdf:RDF element as a generic container.
- Defined some default namespace prefixes
- Grouped triples about the same subject into the same element
- Removed use of rdf:Description and rdf:type, preferring to instead use namespace element names
The result is something that is easier to read and much easier to work with in an XML context. You could even imagine creating an XML schema for this kind of document, particularly if you know which types and predicates are being used in your RDF graphs.
The nice thing about this approach is that its looks just like namespaced XML. For a publishing project I worked on we defined our XML schemas for receipt of data using this kind of approach; the client didn’t really need to know anything about RDF. We just had to explain that:
- rdf:about is how we assign a unique identifier to a entity (and we used xml:base to simplify the contents further to avoid repetition)
- rdf:resource was a “link” between two resources, e.g. for cross-referencing between content and subject categories
If you’re not using RDF containers of collections then those two attributes are the only bit of RDF that creeps into the syntax.
However in our case, we were also using RDF Lists to capture ordering of authors in academic papers. So we also explained that rdf:parseType was a processing instruction to indicate that some element content should be handled as a collection (a list).
This worked very well. We’d ended up with fine-grained document types anyway, to make it easier to update individual resources in the system, e.g. individual journal issues or articles, so the above structure mapped well to the system requirements.
Here’s a slightly more complex example that hopefully further illustrates the point. Here I’m showing nesting of several resource descriptions:
<ex:Article xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/terms/" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:ex="http://example.org/ns/schema/" rdf:about="http://example.org/articles/1"> <dc:title>An example article</dc:title> <dc:description>This is an article</dc:description> <ex:authors rdf:parseType="Collection"> <foaf:Person rdf:about="http://example.org/person/1"> <rdfs:label>Joe Bloggs</rdfs:label> <foaf:homepage rdf:resource="http://example.org/blogs/joe"/> </foaf:Person> <foaf:Person rdf:about="http://example.org/person/2"> <rdfs:label>Sue Bloggs</rdfs:label> <foaf:homepage rdf:resource="http://example.org/blogs/sue"/> </foaf:Person> </ex:authors> <dc:related> <ex:Article rdf:about="http://example.org/articles/2"/> </dc:related> <dc:subject> <skos:Concept rdf:about="http://example.org/categories/example"/> </dc:subject> </ex:Article>
The reality is whether you’re working in an XML or a RDF context, there is very often a primary resource you’re interested in: e.g. your processing a resource or rendering a view of it, etc. This means that in practice there’s nearly always an obvious and natural “root” element to the graph for creating an RDF/XML serialisation. Its just that RDF tools don’t typically let you identify it.
Tip 3: Use RELAX NG
Because of the syntactic variation, writing schemas for RDF/XML can be damn near impossible. But for highly normalised RDF/XML its a much more tractable problem.
My preference has been to use RELAX NG as it offers more flexibility when creating open and flexible content models for elements, e.g. via interleaving. This gives options to leave the document structures a little looser to facilitate serialisation and also allow the contents of the graph to evolve (e.g. addition of new properties).
If you have the option, then I’d recommend RELAX when defining schemas for your XML data.
Tip 4: RDF for metadata; XML for content
The last tip isn’t about RDF/XML per se, I just want to make a general point about where to apply the different technologies.
XML is fantastic at describing document structures and content. RDF is fantastic at describing relationships between things. Both of those qualities are important, but in very different aspects of an application.
In my work in publishing I ended up using a triple store as the primary data repository. This is because the kinds of application behaviour I wanted to drive were increasingly relationship focused: e.g. browsing to related content, author based navigation, concept relationships, etc. Increasingly I also wanted the ability to create new slices and views across the same content and document structures were too rigid.
The extensibility of the RDF graph allowed me to quickly integrate new workflows (using the Blackboard pattern) so that I could, for example, harvest & integrate external links or use text mining tools to extract new relationships. This could be done without having to rework the main publishing workflow, evolve document formats, or the database for the metadata.
However XML works perfectly well for rendering out the detailed content. It would be crazy to try and capture content in RDF/XML (structure yes; but not content). So for transforming XML into HTML or other views, XML was the perfect starting point. We were early adopters of XProc so using pipelines to generate rendered content and to extract RDF/XML for loading into a triple store was easy to do.
In summary RDF/XML is not a great format for working with RDF in an XML context, but its not completely broken. You just need to know how to get the best from it. It provides a default interoperable format for exchanging RDF data over the web, but there are better alternatives for hand-authoring and efficient loading. Once the RDF Working Group completes work on RDF 1.1 its likely that Turtle will rapidly become the main RDF serialisation.
However, I think that RDF/XML will still have a role, as part of a well-designed system, in bridging between RDF and XML tools.
Hello Leigh, I basically agree with you. RDF/XML is not necessarily bad idea, esp. it is even easier for ordinary users to grasp information when presented with XSLT (or converted to HTML with other XML tools).
btw., I’m afraid the rdf:resource attrs on ex:Article and skos:Concept in the last example should be rdf:about. Probably this is one of the most confusing aspects of RDF/XML …
Hi,
You’re right of course, on both counts. Serves me right for posting examples with out parsing them first! I’ll revise the example. Thank you for the bug report!
L.
Hi, thanks for quick response and fix. eh, sorry, one more… some empty elements (rdf:type and foaf:homepage) should be closed with ‘/>’ or end tags… (this is not the problem of RDF/XML, but necessary evil of XML 😉
Also fixed! *blush*
Can you provide an example of an RDF graph that can’t be serialised as RDF/XML? I haven’t encountered this idea before.
Hi Luke,
You might want to check this section of the RDF/XML specification:
http://www.w3.org/TR/rdf-syntax-grammar/#section-Serialising
The main case I was thinking of is where you have a property URI such as:
http://example.org/ns#123
This can’t be turned into a valid XML NCName because it doesn’t conform to the NCName production:
http://www.w3.org/TR/REC-xml/#NT-NameStartChar
Cheers,
L.
Interesting, thanks!
There are also some literal values that can’t be serialised as XML because of restrictions on Unicode characters. I’ve had problems with Jena, for example which will parse a document but then refuse to reserialise it due to encoding issues.
The same problems don’t affect turtle.
Leigh,
I’ve never really understood the widespread distaste for RDF/XML, even withing the semantic circles. As you mention, it is a bridge between the RDF and XML models, and as such it is essential and must not go away. Of course the standard could be better, but it works good enough. I guess most of the RDF/XML haters do not have a background in XML. Just like people designing JSON-only APIs — this recent trend worries me, as it cuts off the whole XML ecosystem and requires an extra model conversion, which is always a bad idea.
I know RDF/XML is also hated within XML/XSLT circles, as being hard to transform. I think part of the problem here was that early experiments used XSLT 1, while XSLT 2 gives much more power.
But I had never really seen a sincere attempt at doing generic RDF/XML transformations, which made me try for myself.
It turns out a single convention (grouping statements by subject as in Jena’s RDF/XML-ABBREV) and a fair amount of key() function allows for generic XSLT that can be extended to produce any XML or XHTML.
The stylesheets simply break down into generic subject/predicate/object templates that can be overridden by specific vocabulary terms. So whoever stated that a generic RDF browser is not possible, was wrong.
What I’m hoping to do is to establish a convention around that and have developers contributing templates for various vocabularies — which would be like crowd-sourced UI, something I haven’t seen done before. Here’s some early stylesheets:
https://github.com/Graphity/graphity-browser/tree/master/src/main/webapp/WEB-INF/xsl/imports
If you go one level up, you’ll find Resource.xsl which is the “master” stylesheet.
I have much more code now, but didn’t have time to merge it back yet. You can see a prototype instance running on http://linkeddata.dk though — the frontend is pure XSLT 2 over RDF/XML.