This is a follow-up to my blog post from yesterday about RDF and JSON. Ed Summers tweeted to say:
…your blog post suggests that an API for linked data is needed; isn’t http already the API?
I couldn’t answer that in 140 characters, so am writing this post to elaborate a little on the last section of my post in which I suggested that “there’s a big data access gulf between de-referencing URIs and performing SPARQL queries”. What exactly do I mean there? And why do I think that the Linked Data API helps?
Is Your Website Your API?
Most Linked Data presentations that discuss the publishing of data to the web typically run through the Linked Data principles. At point three we reach the recommend that:
“When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
This has encourages us to create sites that consist of a mesh of interconnected resources described using RDF. We can “follow our nose” through those relationships to find more information.
This gives us two fundamental two data access options:
- Resource Lookups: by dereferencing APIs we can obtain a (typically) complete description of a resource
- Graph Traversal: following relationships and recursively de-referencing URIs to retrieve descriptions of related entities; this is (typically, not not necessarily) reconstituted into a graph on the client
However, if we take the “Your Website Is Your API” idea seriously, then we should be able to reflect all of the different points of interaction of that website as RDF, not just resource lookups (viewing a page) and graph traversal (clicking around).
As Tom Coates noted back in 2006 in “Native to a Web of Data“, good data-driven websites will have “list views and batch manipulation interfaces”. So we should be able to provide RDF views of those areas of functionality too. This gives us another kind of access option:
- Listing: ability to retrieve lists/collections of things; navigation through those lists, e.g. by paging; and list manipulation, e.g. by filtering or sorting.
It’s possible to handle much of that by building some additional structure into your dataset, e.g. creating RDF Lists (or similar) of useful collections of resources. But if you bake this into your data then those views will potentially need to be re-evaluated every time the data changes. And even then there is still no way for a user to manipulate the views, e.g. to page or sort them.
So to achieve the most flexibility you need a more dynamic way of extracting and ordering portions of the underlying data. This is the role that SPARQL often fulfills, it provides some really useful ways to manipulate RDF graphs, and you can achieve far more with it than just extracting and manipulating lists of things.
SPARQL also supports another kind of access option that would otherwise require traversing some or all of the remote graph.
One example would be: “does this graph contain any foaf:name predicates?” or “does anything in this graph relate to http://www.example.org/bob?”. These kinds of existence checks, as well as more complex graph pattern matching, also tend to be the domain of SPARQL queries. It’s more expressive and potentially more efficient to just use a query language for that kind of question. So this gives us a fourth option:
- Existence Checks: ability to determine whether a particular structure is present in a graph
Interestingly though they are not often the kinds of questions that you can “ask” of a website. There’s no real correlation with typical web browsing features although searching comes close for simple existence check queries.
Where the Linked Data API fits in
So there are at least four kinds of data access option. I doubt whether its exhaustive, but its a useful starting point for discussion.
SPARQL can handle all of these options and more. The graph pattern matching features, and provision of four query types lets us perform any of these kinds of interaction. For example A common way of implementing Resource Lookups over a triple store is to use a DESCRIBE or a CONSTRUCT query.
However the problem, as I see it, is that when we resort to writing SPARQL graph patterns in order to request, say, a list of people, then we’ve kind of stepped around HTTP. We’re no longer specifying and refining our query by interacting with web resources via parameterised URLs, we’re tunnelling the request for what we want in a SPARQL query sent to an endpoint.
From a hypermedia perspective it would be much better if there were a way to be able to handle the “Listing” access option using something that was better integrated with HTTP. It also happens that this might actually be easier for the majority of web developers to get to grips with, because they no longer have to learn SPARQL.
This is what I meant by a “RESTful API” in yesterday’s blog post. In my mind, “Listing things” sits in between Resource Lookups and Existence Checks or complex pattern matching in terms of access options.
It’s precisely this role that the Linked Data API is intended to fulfil. It defines a way to dynamically generate lists of resources from an underlying RDF graph, along with ways to manipulate those collections of resources, e.g. by sorting and filtering. It’s possible to use it to define a number of useful list views for an RDF dataset that nicely complements the relationships present in the data. It’s actually defined in terms of executing SPARQL queries over that graph, but this isn’t obvious to the end user.
These features are supplemented with the definition of simple XML and JSON formats, to supplement the RDF serializations that it supports. This is really intended to encourage adoption by making it easier to process the data using non RDF tools.
So, Isn’t HTTP the API?
Which brings me to the answer to Ed’s question: isn’t HTTP the API we need? The answer is yes, but we need more than just HTTP, we also need well defined media-types.
Mike Amundsen has created a nice categorisation of media types and a description of different types of factors they contain: H Factor.
Section 5.2.1.2 of Fielding’s dissertation explains that:
Control data defines the purpose of a message between components, such as the action being requested or the meaning of a response. It is also used to parameterize requests and override the default behavior of some connecting elements.
As it stands today neither RDF nor the Linked Data API specification ticks all of the the HFactor boxes. What we’ve really done so far is define how to parameterise some requests, e.g. to filter or sort based on a property value, but we’ve not yet defined that in a standard media type; the API configuration captures a lot of the requisite information but isn’t quite there.
That’s a long rambly blog post for a Friday night! Hopefully I’ve clarified what I was referring to yesterday. I absolutely don’t want to see anyone define an API for RDF that steps around HTTP. We need something that is much more closely aligned with the web. And hopefully I’ve also answered Ed’s question.