Category Archives: Markup

Principled use of RDF/XML

Everyone loves to hate RDF/XML. Indeed many have argued that RDF/XML is responsible for holding back semantic web adoption. I’m not sure that I fully agree with that (there’s a lot of other issues to consider) but its certainly awkward to work with if you’re trying to integrate both RDF and XML tools into your application.

It’s actually that combination that causes the awkwardness. If you’re just using RDF tools then RDF/XML is mostly fine. It benefits from XML’s Unicode support and is the most widely supported RDF serialisation. There are downsides though. For example there are some potential RDF graphs can’t be serialised as RDF/XML. But that is easy to avoid.

Developers, particularly XML developers, feel cheated by RDF/XML because of what they see as false advertising: its an XML format that doesn’t play nicely with XML tools. Some time ago, Dan Brickley wrote a nice history on the design of RDF/XML which is worth a read for some background. My goal here isn’t to rehash the RDF/XML discussion or even to mount a defense of RDF/XML as a good format for RDF (I prefer Turtle).

But developers are still struggling with RDF/XML, particularly in publishing workflows where XML is a good base representation for document structures, so I think its worthwhile capturing some advice on how to reach a compromise with RDF/XML that allows it to work nicely with XML tools. I can’t remember seeing anyone do that before, so I thought I’d write down some of my experiences. These are drawn from creating a publishing platform that ingested metadata and content in XML, used Apache Jena for storing that metadata, and Solr as a search engine. Integration between different components was carried out using XML based messaging. So there were several places where RDF and XML rubbed up against one another.

Tip 1: Don’t rely on default serialisations

The first thing to note is that RDF/XML offers a lot of flexibility in terms of how an RDF graph can be serialised as XML. A lot. The same graph can be serialised in many different ways using a lot of syntactic short-cuts. More on those in a moment.

It’s this unbounded flexibility that is the major source of the problems: producers and consumers may have reasonable default assumptions about how data will be made published that are completely at odds with one another. This makes it very difficult to consume arbitrary RDF/XML with anything other than RDF tools.

JSON-LD offers a lot of flexibility too, and I can’t help but wonder whether that flexibility is going to come back and bite us in the future.

By default RDF tools tend to generate RDF/XML in a form that makes it easy for them to serialise. This tends to mean automatically generated namespace prefixes and a per-triple approach to serialising the graph, e.g:

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:p0="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:p1="http://xmlns.com/foaf/0.1/">
  <rdf:Description rdf:about="http://example.org/person/joe">
    <p0:label>Joe Bloggs</po:label>
  </rdf:Description>
  <rdf:Description rdf:about="http://example.org/person/joe">
    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://example.org/person/joe">
    <p1:homepage rdf:resource="http://example.org/blogs/joe"/>
  </rdf:Description>
 </rdf:RDF>

This is a disaster for XML tools as the description of the resource is spread across multiple elements making it hard to process. But its efficient to generate.

Some RDF frameworks may provide options for customising the output to apply some of the RDF/XML syntactic shortcuts. As we’ll see in a moment these are worth embracing and may produce some useful regularity.

But if you need to generate an XML format that has, for example, a precise ordering of child elements then you’re not going to get that kind of flexibility by default. You’ll need to craft a custom serialiser. Apache Jena allows you to use create RDF Writers to support this kind of customization. This isn’t ideal as you need to write code — even to tweak the output options — but it gives you more control.

So, if you need to generate an XML format from RDF sources then ensure that you normalize your output. If you have control over the XML document formats and can live with some flexibility in the content model, then using RDF/XML syntax shortcuts offered by your RDF tools might well be sufficient. However if you’re working to a more rigid format, then you’re likely to need some custom code.

Tip 2: Use all of the shortcuts

Lets look at the above example again but with a heavy dusting of syntax sugar:

<foaf:Person
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/"
 rdf:about="http://example.org/person/1">
  <rdfs:label>Joe Bloggs</rdfs:label>
  <foaf:homepage rdf:resource="http://example.org/blogs/joe"/>
</foaf:Person>

Much nicer! The above describes exactly the same RDF graph as we had before. What have we done here:

  • We’ve omitted the rdf:RDF element as its unnecessary. If you have a single “root” resource in your graph then you can just this as the document element. If we had multiple, unrelated Person resources in the document then we’d need to re-introduce the rdf:RDF element as a generic container.
  • Defined some default namespace prefixes
  • Grouped triples about the same subject into the same element
  • Removed use of rdf:Description and rdf:type, preferring to instead use namespace element names

The result is something that is easier to read and much easier to work with in an XML context. You could even imagine creating an XML schema for this kind of document, particularly if you know which types and predicates are being used in your RDF graphs.

The nice thing about this approach is that its looks just like namespaced XML. For a publishing project I worked on we defined our XML schemas for receipt of data using this kind of approach; the client didn’t really need to know anything about RDF. We just had to explain that:

  • rdf:about is how we assign a unique identifier to a entity (and we used xml:base to simplify the contents further to avoid repetition)
  • rdf:resource was a “link” between two resources, e.g. for cross-referencing between content and subject categories

If you’re not using RDF containers of collections then those two attributes are the only bit of RDF that creeps into the syntax.

However in our case, we were also using RDF Lists to capture ordering of authors in academic papers. So we also explained that rdf:parseType was a processing instruction to indicate that some element content should be handled as a collection (a list).

This worked very well. We’d ended up with fine-grained document types anyway, to make it easier to update individual resources in the system, e.g. individual journal issues or articles, so the above structure mapped well to the system requirements.

Here’s a slightly more complex example that hopefully further illustrates the point. Here I’m showing nesting of several resource descriptions:

<ex:Article
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/"
 xmlns:dc="http://purl.org/dc/terms/"
 xmlns:skos="http://www.w3.org/2004/02/skos/core#"
 xmlns:ex="http://example.org/ns/schema/"
 rdf:about="http://example.org/articles/1">

 <dc:title>An example article</dc:title>
 <dc:description>This is an article</dc:description>
 <ex:authors rdf:parseType="Collection">
   <foaf:Person rdf:about="http://example.org/person/1">
     <rdfs:label>Joe Bloggs</rdfs:label>
     <foaf:homepage rdf:resource="http://example.org/blogs/joe"/>
   </foaf:Person>
   <foaf:Person rdf:about="http://example.org/person/2">
     <rdfs:label>Sue Bloggs</rdfs:label>
     <foaf:homepage rdf:resource="http://example.org/blogs/sue"/>
   </foaf:Person>
 </ex:authors>
 <dc:related>
   <ex:Article rdf:about="http://example.org/articles/2"/>
 </dc:related>
 <dc:subject>
   <skos:Concept rdf:about="http://example.org/categories/example"/>
 </dc:subject>
</ex:Article>

The reality is whether you’re working in an XML or a RDF context, there is very often a primary resource you’re interested in: e.g. your processing a resource or rendering a view of it, etc. This means that in practice there’s nearly always an obvious and natural “root” element to the graph for creating an RDF/XML serialisation. Its just that RDF tools don’t typically let you identify it.

Tip 3: Use RELAX NG

Because of the syntactic variation, writing schemas for RDF/XML can be damn near impossible. But for highly normalised RDF/XML its a much more tractable problem.

My preference has been to use RELAX NG as it offers more flexibility when creating open and flexible content models for elements, e.g. via interleaving. This gives options to leave the document structures a little looser to facilitate serialisation and also allow the contents of the graph to evolve (e.g. addition of new properties).

If you have the option, then I’d recommend RELAX when defining schemas for your XML data.

Tip 4: RDF for metadata; XML for content

The last tip isn’t about RDF/XML per se, I just want to make a general point about where to apply the different technologies.

XML is fantastic at describing document structures and content. RDF is fantastic at describing relationships between things. Both of those qualities are important, but in very different aspects of an application.

In my work in publishing I ended up using a triple store as the primary data repository. This is because the kinds of application behaviour I wanted to drive were increasingly relationship focused: e.g. browsing to related content, author based navigation, concept relationships, etc. Increasingly I also wanted the ability to create new slices and views across the same content and document structures were too rigid.

The extensibility of the RDF graph allowed me to quickly integrate new workflows (using the Blackboard pattern) so that I could, for example, harvest & integrate external links or use text mining tools to extract new relationships. This could be done without having to rework the main publishing workflow, evolve document formats, or the database for the metadata.

However XML works perfectly well for rendering out the detailed content. It would be crazy to try and capture content in RDF/XML (structure yes; but not content). So for transforming XML into HTML or other views, XML was the perfect starting point. We were early adopters of XProc so using pipelines to generate rendered content and to extract RDF/XML for loading into a triple store was easy to do.

In summary RDF/XML is not a great format for working with RDF in an XML context, but its not completely broken. You just need to know how to get the best from it. It provides a default interoperable format for exchanging RDF data over the web, but there are better alternatives for hand-authoring and efficient loading. Once the RDF Working Group completes work on RDF 1.1 its likely that Turtle will rapidly become the main RDF serialisation.

However, I think that RDF/XML will still have a role, as part of a well-designed system, in bridging between RDF and XML tools.

XForms on the Intranet

Elliotte Harold has published a nice introduction to XForms in Firefox on IBM developerWorks. In the conclusion he notes that:
Client-side XForms processing won’t be possible for public-facing sites until XForms is more widely deployed in browsers. However, that doesn’t mean you can’t deploy it on your intranet today. If you’re already using Firefox (and if you aren’t, you should be), all that’s required is a simple plug-in. After that’s installed, you can take full advantage of XForms’ power, speed, and flexibility.
I’d agree with this whole-heartedly. I wrote and deployed a little XForms application just before Christmas and it was a very painless exercise indeed.
Over the past few years we’ve rolled out an number of RESTful XML based APIs internally. We’ve also toyed with different ways to build tools to manage systems using these APIs, including using Java Swing desktop tools, simple HTML forms, etc. Mainly we’ve been trying for a while to find a sweet spot between ease of implementation and a reasonably good user experience.
Recently I’d been toying with a Javascript library to one of our REST interfaces based around the Prototype library. It was fun if occasionally frustrating banging my head against Javascript. However it wasn’t finished and I needed to quickly roll out some forms for managing some key data. So I took another look at XForms. I’d researched it a few years ago and had rejected it because of the lack of browser support and the different ways that the plugins required you to deploy the forms.
As almost everyone internal has gravitated towards Firefox cross-browser support isn’t a strong requirement so I went ahead and built the system using XForms. It was a very satisfying experience: the syntax is easy to get to grips with, and its possible to create some fairly slick AJAX style forms with a minimum of fuss. And more fun that messing with Javascript.
So for us at least XForms does seem to hit a sweet spot for rapid tools development, particularly as we already have a lot of existing XML interfaces. In fact the exercise highlighted a few flaws in our interfaces (e.g. delivering correct mime types, under use of “hypermedia” to link between resources in some areas) so was a good learning exercise in its own right.
It would be nice to see some slicker custom controls for different data types though. I think AJAX and client-side scripting still corners the market on slick dynamic UIs, and will do for some time. But for sheer ease of use, and getting things done, XForms gets the thumbs up from me.

XML Hypertext: Not Dead, Merely Resting?

“The dreams of XML hypertext are dead, or at least thoroughly dormant”

Simon St Laurent’s XML.com article on XQuery is an interesting read. But I think the above statement is worth discussing. Is XML hypertext really dead? Or, if its dormant, is it going to remain so?
Firstly what is XML hypertext? I presume from the context of the quote that Simon is referring to client side use of XML on the web. To me this incorporates several use cases including both the use of XML for presentation (XHTML, SVG, etc) and for data publishing (RSS, Atom, XML based web services). There is an obvious need for linking in both of these use cases.
Where I’d agree with St. Laurent is that most of the existing work here is dormant or duplicated. For example while SVG makes use of XLink, its not used in RSS and Atom, and was deemed not flexible enough for use in XHTML due to issues with attribute naming. However the basic model, labelled links with activation indicators (onLoad, onClick, etc) seems to be shared across vocabularies. But still, XLink has been a Recommendation since 2001 and has yet to set the world on fire.
However where I’d disagree with Simon is that XLink or XML hypertext is thoroughly dormant. Much as I hate to make predictions, I think we’re only just gaining any appreciation of the power of producing good hypertext, because we’re only now seeing the large scale publishing of machine-processable, interrelated data that makes linking worthwhile.
I think growing appreciation of the REST architecture is driving a greater understanding of the benefits of highly linked resources. Sure, we all know its good practice to avoid making web pages that are “dead ends”, but not everyone is publishing data to the same guidelines. The principle of “Hypermedia as the engine of application state” is still not widely understood; it’s a piece of REST Zen that benefits from practical implementation.
Hypertext simplifies client-side development as it avoids spreading the requirement that the client must know how to construct URIs: this reduces coupling. It also simplifies client logic as “the navigation options” (i.e. state transfers) can be presented by the server as the result of previous interactions; the client can simply select from amongst the labelled options. For example if the client needs to access some specific data, e.g. a list of my recently published photos, it can select the appropriate link to retrieve (assuming its available).
That link may be to an entirely different service.
In an XTech 2005 paper I tried to argue (I suspect not very clearly) that linking offers the best route to integration of data from multiple web services. Linking as a means to easier mashing.
If the current data publishing trends continue then I suspect there’s going to be a growing understanding of the benefits of hypertext and this inevitably drive some renewed interest in XLink or a related technology.
What I personally like about RDF in this regard is the “closure” it offers: every resource has a URI, every schema has a URI, every Property and Class has a URI so the data, metadata and schemas can be linked together, and this offers some very powerful capabilities.

Messages From the Future

On the Web, you need to be able to process messages from the future.

Interesting post from Mark Baker about XML validation and web services:
Validation considered harmful

OpenDocument and XMP

This is the second part of my look at XMP. This time I’m focusing on the potential for using XMP as the metadata format for OpenDocument (ODF).
This is part of a broader discussion to help define the future direction for the ODF metadata format, one proposal on the table is to use RDF, via a constrained RDF/XML syntax. There’s a wiki available for discussing this issue, particularly how to map existing metadata to RDF.
At least some of the impetus for exploring richer metadata support has come from the bibliographic sub-project which aims to build-in support for bibliography management into OpenOffice 3.0.
RDF is a good fit for the flexible storage and formatting requirements that arise from bibliographic metadata. As XMP is an RDF profile its worthy of consideration, and in fact this is the proposal behind Alan Lilich’s posting to the OpenDocument TC member list. Lilich’s discussion document frames the rest of this posting.

Continue reading

Looking at XMP

I’ve been taking a look at XMP as I’ve been considering different ways to “enrich” content. Embedding metadata is one option and XMP aims to fulfill the role of a metadata format suitable for embedding in a diverse range of media formats.
It’s also under discussion as way to embed metadata in the OpenDocument format. The alternatives available in that quarter have been under discussion in various circles for some time. Bruce D’Arcus points to the latest entry to that discussion in his recent “OpenDocument and XMP” posting.
I thought I’d write up some notes on XMP in general and contribute some thoughts towards that debate. This is the first of two postings on this topic.

Continue reading

Florescu: Re-evaluating the Big Picture

Ken North just posted this email to XML-DEV drawing attention to a presentation by Daniela Florescu titled Declarative XML Processing with XQuery — Re-evaluating the Big Picture (Warning: PDF). It makes for interesting reading.
In the presentation, Florescu argues that XML is in a growth crisis and that there’s a need for more architectural work to tie together components of the XML landscape ranging from XQuery and XSLT through to RDF and OWL. Florescu believes that XML is about more than syntax and will in fact become the key model for information, not just bits on a wire. In short Florescu believes that XML has yet to achieve its full potential and to do that some further work needs to be done.
The presentation is worth reading in its entirety. The majority of the presentation does focus on XQuery, in particular the fact that its not really a query language: it’s a programming language and folk are already using it in this context. But there’s much more to it. Semantic web folk will find much that will have them nodding in agreement.
Florescu suggests a number of concrete areas that require work. Amongst these are:

  • Make XML a graph not a tree, by making links a first class part of the model
  • Integrate the XML data model with RDF
  • Extend programming capabilities of XQuery, e.g. to include assertions, error-handling, metadata extraction functions and continous queries. This latter area is interesting as it would allow an Xquery to run continously, acting on a stream of XML documents as they arrive
  • Integrate XQuery with OWL and RDF. E.g. to allow searching an XML document by semantic classification of nodes, rather than their names.
  • Make browsers XQuery aware, and developer a simple HTTP protocol for invoking XQuery on a remote repository. (I’ve been working with the SPARQL protocol recently and its occured to me several times that an equivalent for XQuery is an obvious area for further work)

All in all I find this to be a very thought-provoking presentation; there’s a lot of interesting ideas in there. For the Semantic Web crowd many of these will be old news: being able to query/manipulate data based on semantics is the core of RDF; linking as a first class model element is something we rely on constantly. But there’s also some new angles to consider. For example there’s a lot of work happening to tie programming languages in with XML, and XML vocabularies such as XQuery becoming more like scripting languages: what’s the equivalent in semantic web circles? Could an ontology aware version of XQuery provide a useful data manipulation environment?
I expect the XML-DEV thread to grow pretty quickly. Will be interested to see if this gets picked up and discussed by other communities also.

Goodbye XML-Deviant

I see Micah’s latest XML-Deviant is up on XML.com this week, and its also to be the last in the series. It’s a shame to see it go as I’ve enjoyed reading the column over the last few years. I also thoroughly enjoyed contributing to the column during my own period of XML-Deviancy. But all things come to an end; I’m looking forward to seeing what replaces the column in future.
Tip of the hat to the other XML-Deviants: Edd, Kendall and Micah for all of their efforts along the way; especially Edd for originally conceiving of the column.

Simple List Extensions Critique

Some thoughts on the Simple List Extensions Specification. I’ve been waiting a few days as I wanted to get a feel for what problems are being addressed by the new module; it’s not clear from the specification itself. Dare Obasanjo has summarised the issues, so I now feel better armed to comment.
My first bits of feedback on the specification are mainly editorial: include some rationale, include some examples, and include a contact address for the author/editor so feedback can be directly contributed. There’s at least one typo: where do I send comments?
The rest of the comments come from two perspectives: as an XML developer and as an RSS 1.0/RDF developer. I’ll concentrate on the XML side as others have already made most of the RDF related points.

Continue reading

XTech Day Three

Belatedly (I only got back from Amsterdam last Monday), here are some notes from XTech Day 3.
On the Friday morning I initially attended two talks about RDF frameworks, firstly Dave Beckett’s Bootstrapping RDF applications with Redland and then David Wood’s introduction to
Kowari: A Platform for Semantic Web Storage and Analysis. I’ve not really used either of these toolkits yet, but at work we’re looking at trying out Kowari as one of the candidate triple stores for holding our massive dataset. John Barstow‘s work on the port of Redland to windows makes it more likely that I’ll be trying out Dave’s toolkit for some personal hacking projects too.

Continue reading

Follow

Get every new post delivered to your Inbox.

Join 28 other followers