RDF and JSON: A Clash of Model and Syntax

I had been meaning to write this post for some time. After reading Jeni Tennison’s post from earlier this week I had decided that I didn’t need too, but Jeni and Thomas Roessler suggested I publish my thoughts. So here they are. I’ve got more things to say about where efforts should be expended in meeting the challenges that face us over the next period of growth of the semantic web, but I’ll keep those for future posts.

Everyone agrees that a JSON serialization of RDF is a Good Thing. And I think nearly everyone would agree that a standard JSON serialization of RDF would be even better. The problem is no-one can agree on what constitutes a good JSON serialization of RDF. As the RDF Next Working Group is about to convene to try and define a standard JSON serialization now is a very good time to think about what it is we really want them to achieve.

RDF in JSON, is RDF in XML all over again

There are very few people who like RDF/XML. Personally, while it’s not my favourite RDF syntax, I’m glad its there for when I want to convert XML formats into RDF. I’ve even built an entire RDF workflow that began with the ingestion of RDF/XML documents; we even validated them against a schema!

There are several reasons why people dislike RDF/XML.

Firstly, there is a mis-match in the data models: serialization involves turning a graph into a tree. There are many different ways to achieve that so, without applying some external constraints, the output can be highly variable. The problem is that those constraints can be highly specific, so are difficult to generalize. This results in a high degree of syntax variability of RDF/XML in the wild, and that undermines the ability to use RDF/XML with standard XML tools like XPath, XSLT, etc. They (unsurprisingly) operate only on the surface XML syntax not the “real” data model.

Secondly, people dislike RDF/XML because of the mis-match in (loosely speaking) the native data types. XML is largely about elements and attributes whereas RDF has resources, properties, literals, blank nodes, lists, sequences, etc. And of course there are those ever present URIs. This leads to additional syntax short-cuts and hijacking of features like XML Namespaces to simplify the output, whilst simultaneously causing even more variability in the possible serializations.

Thirdly, when it comes to parsing, RDF/XML just isn’t a very efficient serialization. It’s typically more verbose and can involve much more of a memory overhead when parsing than some of the other syntaxes.

Because of these issues, we end up with a syntax which, while flexible, requires some profiling to be really useful within an XML toolchain. Or you just ignore the fact that its XML at all and throw it straight into a triple store, which is what I suspect most people do. If you do that then an XML serialization of RDF is just a convenient way to generate RDF data from an XML toolchain.

Unfortunately when we look at serializing RDF as JSON we discover that we have nearly all of the same issues. JSON is a tree; so we have the same variety of potential options for serializing any given graph. The data types are also still different: key-value pairs, hashes, lists, strings, dates (of a form!), etc. versus resource, properties, literals, etc. While there is potential to use more native datatypes, the practical issues of repeatable properties, blank nodes, etc mean that a 1:1 mapping isn’t feasible. Lack of support for anything like XML Namespaces means that hiding URIs is also impossible without additional syntax conventions.

So, ultimately, both XML and JSON are poor fits for handling RDF. I think most people would agree that a specific format like Turtle is much easier to work with. It’s also better as starting point for learning RDF because most of the syntax is re-used in SPARQL. That’s why standardising Turtle, ideally extended to support Named Graphs, needs to be the first item on the RDF Next Working Group’s agenda.

What do we actually want?

What purpose are we trying to achieve with a JSON serialization of RDF? I’d argue that there are several goals:

Support for scripting languages: Provide better support for processing RDF in scripting languages
Creating convergence: Build some convergence around the dizzying array of existing RDF in JSON proposals, to create consistency in how data is published
Gaining traction: Make RDF more acceptable for web developers, with the hope of increasing engagement with RDF and Linked Data

I don’t think that anyone considers a JSON serialization of RDF as a better replacement for RDF/XML. I think everyone is looking to Turtle to provide that.

I also don’t think that anyone sees JSON as a particularly efficient serialization of RDF, particularly for bulk loading. It might be, but I think N-Triples (a subset of Turtle) fulfills that niche already: it’s easy to stream and to process in parallel.

Lets look at each of those goals in turn.

Support for scripting languages

Unarguably its much, much easier to process JSON in scripting languages like Javascript, Ruby, PHP than RDF/XML.

Parser support for JSON is ubiquitous as its the syntax de jour. Just as XML was when the RDF specifications were being written. Typically JSON parsing is much more efficient. That’s especially true when we look at Javascript in the browser.

From that perspective RDF in JSON is an instant win as it will simplify consumption of Linked Data and the results of SPARQL CONSTRUCT and DESCRIBE queries. There are other issues with getting wide-spread support for RDF across different programming languages, e.g. proper validation of URIs, but fast parsing of the basic data structure would be a step in the right direction.

Creating Convergence

I think I’ve seen about a dozen or more different RDF in JSON proposals. There’s a list on the ESW wiki and some comparison notes on the Talis Platform wiki, but I don’t think either are complete. If I get chance I’ll update them. The sheer variety confirms my earlier points about the mis-matches between models: everyone has their own conception of what constitutes a useful JSON serialization.

Because there are less syntax options in JSON, the proposals run the full spectrum from capturing the full RDF model but making poor use of JSON syntax, through to making good use of JSON syntax but at the cost of either ignoring aspects of the RDF model or layering additional syntax conventions on top of JSON itself. As an aside, I find it interesting that so many people are happy with subsetting RDF to achieve this one goal.

The thing we should recognise is that none of the existing RDF in JSON formats are really useful without an accompanying API. I’ve used a number of different formats and no matter what serialization I’ve used I’ve ended up with helper code that simplifies some or all of the following:

Lookup of all properties of a single resource
Mapping between URIs and short names (e.g. CURIES or locally defined keys) for properties
Mapping between conventions for encoding particular datatypes (or language annotations) and native objects in the scripting language
Cross-referencing between subjects and objects; and vice-versa
Looking up all values of a property or a single value (often the first)

In addition, if I’m consuming the results of multiple requests then I may also end up with a custom data structure and code for merging together different descriptions. Even if its just an array of parsed JSON documents and code to perform the above lookups across that collection.

So, while we can debate the relative aesthetics of different approaches, I think its focusing attention on the wrong areas. What we should really be looking at is an API for manipulating RDF. One that will work in Javascript, Ruby or PHP. While I acknowledge the lingering horror of the DOM, I think the design space here is much simpler. Maybe I’m just an optimist!

If we take this approach then what we need is an JSON serialization of RDF that covers as much of the RDF model as possible and, ideally, is already as well supported as possible. From what I’ve seen the RDF/JSON serialization is actually closest to that ideal. It’s supported in a number of different parsing and serialising libraries already and only needs to be extended to support blank nodes and Named Graphs, which is trivial to do. While its not the prettiest serialization, given a vote, I’d look at standardising that and moving on to focus on the more important area: the API.

Gaining Traction

Which brings me to the last use case. Can we create a JSON serialization of RDF that will help Linked Data and RDF get some traction in the wider web development community?

The answer is no.

If you believe that the issues with gaining adoption are purely related to syntax then you’re not listening to the web developer community closely enough. While a friendlier syntax may undoubtedly help, an API would be even better. The majority of web developers these days are very happy indeed to work with tools like JQuery to handle client-side scripting. A standard JQuery extension for RDF would help adoption much more than spending months debating the best way to profile the RDF model into a clean JSON serialization.

But the real issue is that we’re asking web developers to learn not just new syntax but also an entirely new way to access data: we’re asking them to use SPARQL rather than simple RESTful APIs.

While I think SPARQL is an important and powerful tool in the RDF toolchain I don’t think it should be seen as the standard way of querying RDF over the web. There’s a big data access gulf between de-referencing URIs and performing SPARQL queries. We need something to fill that space, and I think the Linked Data API fills that gap very nicely. We should be promoting a range of access options.

I have similar doubts about SPARQL Update as the standard way of updating triple stores over the web, but that’s the topic of another post.

Summing Up

As the RDF Next Working Group gets underway I think it needs to carefully prioritise its activities to ensure that we get the most out of this next phase of development and effort around the Semantic Web specifications. It’s particularly crucial right now as we’re beginning to see the ideas being adopted and embraced more widely. As I’ve tried to highlight here, I think there’s a lot of value to be had in having a standard JSON serialization of RDF. But I don’t think that there’s much merit in attempting to create a clean, simple JSON serialization that will meet everyone’s needs.

Standardising Turtle and an API for manipulating RDF data has more value in my view. RDF/JSON as a well implemented specification meets the core needs of the semantic web developer; a simple scripting API meets the needs of everyone else.

14 thoughts on “RDF and JSON: A Clash of Model and Syntax”

Pingback: Tweets that mention RDF and JSON: A Clash of Model and Syntax « Lost Boy -- Topsy.com
Bruce D'Arcus says:

December 2, 2010 at 10:32 pm

Exactly. JSON is a distraction; the real issue is easy-to-use APIs to work with the data in Javascript, PHP, Python, etc.
Josef B says:

December 3, 2010 at 7:01 am

maybe I have to reread this post, but its confusing. yea, turtle but then rdf/json. but json won’t work. Which is it? An an API? Like Jena? Or an api for JSON use, or for Turtle.
Manu Sporny says:

December 3, 2010 at 7:46 am

Hi Leigh, I don’t know if you knew this, but the RDFa Working Group has been developing a generalized API for RDF and RDFa for some time now:

http://www.w3.org/TR/rdfa-api/

It’s getting a major overhaul this week, being split into an RDF API and an RDFa API.

As for this current wave against RDF in JSON, that I guess Jeni started, I strongly disagree. Our company needs a serialization of RDF in JSON – or more precisely, we need a more light-weight, programming-friendly way of expressing RDF than TURTLE or any of the other serializations/programming models out there. Have you looked at JSON-LD yet:

http://json-ld.org/

We’ve settled on it for our next commercial system, which is written in C++, not JSON btw, because it’s easier to express and manipulate RDF graphs as associative arrays instead of TURTLE or Triple Stores. That is why we’re interested in JSON-LD – because of the universality of its data structure, not just because it is another serialization of RDF. It just so happens that it works well in JavaScript environments as well.
Daniel O'Connor says:

December 3, 2010 at 7:50 am

I’ll echo Bruce’s comments: forget RDF in JSON, let me know where there’s a turtle parsing extension for PHP.

When I look at my use of JSON data from APIs, it’s merely just a transport container, which is translated into concepts my code already knows.

I for one don’t mind the RDF/XML serialization: I can xpath over it and treat it like Plain Old XML. I know this is heresy, but when my legacy PHP application doesn’t have a concept of a graph structure, why can’t I just extract the tiniest bit of information with my familiar XML tools.

I think the biggest problem we all face is how graphs are represented in code.
Every RDF library I have seen feels horrible to work with. This is not dissimilar to how PHP’s DOM APIs felt before simplexml and xpath.
If you were working with a graph in language XYZ, what would the API look like for you?
admin says:

December 3, 2010 at 9:35 am

Josef,

I didn’t say that JSON won’t work, I said that it won’t help drive adoption. As for an API I’m keen to see a simple lightweight API for working with RDF data that could be implemented in a variety of languages. The RDFa API might fit the bill, but I’ve not had chance to look at it yet.

Manu,

Thanks for the pointer, I need to take a proper look at the RDFa API.

However, again, I must not have gotten my point across: I’m not against RDF in JSON. In fact I spend a chunk of the post discussing some of its merits and saying that its a good thing to have.

My suggestion is that we shouldn’t focus on it as the main deliverable and recognise that no matter what is produced it won’t meet anyone’s ideal.

I use RDF/JSON almost exclusively now when consuming data from Javascript and Ruby. I have looked at JSON-LD but didn’t find it solved any new problems for me. I agree that using associative arrays is helpful, simple and portable, but stand by my assertion that some helper code is invariably necessary. That would be the essence of the API.

L.
Paul Wilton says:

December 3, 2010 at 11:40 am

Hi !
I don’t think RDF JSON serialization is a distraction. It is a crazy situation to have so many RDF JSON representations out there. It is also would be great to standardise on an API – but even with a standard API, one still has to get the RDF into the scripting languages memory space in order that it’s API can manipulate it.
For PHP, RDF/JSON is undoubtably very efficient (probably the most efficient) way of doing this.
So I think these are two separate issues, we need APIs, we also need standard representations.
At the BBC, we have been very successfully using RDF/JSON as a representation of choice for marshalling RDF into our PHP rendering layer.
For the World Cup project a middle tier would serve RDF in a variety of flavours, content negotiated on mime-type. Internally apps might request application/rdf+xml or text/rdf+n3. For the PHP presentation layer we requested application/rdf+json from the middle tier, while being very aware that this was not an agreed standard. We are currently using Nick Humfrey’s EasyRDF as our PHP API of choice.
This is the approach we are continuing to adopt as we roll out more user facing sites/pages based on semantic publishing.
For me an RDF/JSON standard can’t come soon enough. As for APIs, an agreed API syntax would be good, and would certainly make adoption in the CSD community much easier. However, I cant help thinking that APIs evolve quickly, they come and go, in fashion and out of fashion. I am not convinced a governing body would be able to maintain a standard API to keep up with end-users demands and needs. Hopefully I will be proved wrong 🙂

Paul Wilton (Tech/Dev Lead, BBC Semantic Publishing)
Pingback: Scott Banwart's Blog » Blog Archive » Distributed Weekly 79
Christopher Gutteridge says:

December 3, 2010 at 3:43 pm

I agree about easy APIs and have been working on one in PHP, although it should be quite easy to code in other scripting languages — I’ve started on a JS version.

http://graphite.ecs.soton.ac.uk/

And the unfinished javascript version: http://graphite.ecs.soton.ac.uk/graphitejs/

(I know it’s a bit crass promoting my own work, but I put a whole lot of effort into making Graphite and I think it’s a good tool and deserves to be used.)
Pingback: RDF Data Access Options, or Isn’t HTTP already the API? « Lost Boy
Graham Klyne says:

December 4, 2010 at 8:35 pm

You suggest three motivations (“What do we actually want?”) for an RDF-in-JSON specification.

I think there’s a fourth: a path for migrating applications that currently use JSON (and for many, that’s the easiest way to get a V0.1 app up-and-running) to become players in the wider web of linked data. (Something for which I have found Sandro Hawke’s JRON ideas useful, but I’ve seen at least other RDF-in-JSON proposal with similar properties.)

I think simply representing triples in JSON possibly *is* a distraction, but having a way to interpret common data structures (of the kind easily expressed in JSON) as RDF triples may be less so.

I can see that a well-crafted API for RDF might be a good way to address that, and ultimately more flexible as it would be somewhat applicable across a wider range of languages than Javascript. I think a good test for such an API would be that it would be trivially easy to write an ingest handler for *any* of the many RDF-in-JSON representations. This would take some focus away from JSON as yet another representation for RDF.
Christopher Gutteridge says:

December 5, 2010 at 12:18 pm

The ship has probably sailed, but I really hate the language field in RDF.

Why on earth are there not just datatypes for string literals in various languages. It’s a pointless complexity which makes me sad.
Pingback: Länksprutning – 7 December 2010 – Månhus
Pingback: How to Analyze Wikileaks Data – R SPARQL « DECISION STATS