I’ve had several conversations recently with people who are either interested in, or actually implementing Linked Data, and are struggling with some important questions
- How much data should I give away?
- If I wanted to charge for more than just the basic data, then how would I handle that?
My usual response to the first of those questions is: “as much as you feel comfortable with”. There’s still so much data that’s not yet visible or accessible in machine-readable formats that any progress is good progress. Let’s get more data out there now. More is better.
It usually doesn’t take long to get to the second question. If you’ve spent time evangelising to people about the power and value of data, and particularly their data, then its natural for them to begin thinking about how it can be monetized.
Scott Brinker has done a good job of summarising a range of options for Linked Data business models. I’ve chipped into that discussion already. Instead what I wanted to briefly discuss here is some of the mechanics of implementing access to what we might call “premium Linked Data”, or as I’ll refer to it “Enhanced Descriptions”.
Premium Linked Data
It’s possible to publish Linked Data that is entirely access controlled. Access might be limited to users behind the firewall (“Enterprise Linked Data”) or only to authorised paying customers. As a paid up customer you’d be given an entry point into that Linked Data and would supply appropriate credentials in order to access it.
This data isn’t going to be something you’d discover on the open web. There are many different authentication models that could be used to mediate access to this “Dark Data”. The precise mechanisms aren’t that important and the right one is likely to vary for different industries and use cases. Although I think there’s a strong argument in using something that dove-tails nicely with HTTP and web infrastructure in general.
What interests me more is the scenario in which a data publisher might be exposing some public data under a liberal open license, but also wants to make available some “premium” metadata. I.e. some value-added data that is only available to paid-up customers. In this scenario it would be useful to be able to link together the open and closed data, allowing a user agent to detect that there is extra value hidden behind some kind of authentication barrier. I think this is likely to become a very common pattern as it aids discovery of the value-added material. Essentially its the existing pattern for access controlling content that we have on the web of documents.
Its the mechanics of implementing this public/private scenario that has cropped up in my recent conversations.
Enhanced Descriptions
When I dereference the URI of a resource I will typically get redirected to a document that describes that resource. This document might contain data like this (in Turtle):
ex:document
foaf:primaryTopic ex:thing.
ex:thing
rdfs:label "Some Thing".
i.e. the document contains some data about the resource, and there’s a primary topic relationship between the document and the resource.
If we want to point to additional RDF documents that also describe this resource, or related data, then we can use an rdfs:seeAlso link:
ex:document
foaf:primaryTopic ex:thing.
ex:thing rdfs:label "Some Thing";
rdfs:seeAlso ex:otherDocument.
We can use the rdfs:seeAlso relationship to point to additional documents either within a specific dataset or in other locations on the web. Those documents provide useful annotations about a resource.
An “Enhanced Description” will contain additional value-added data about a resource. We could just refer to this document using an rdfs:seeAlso link. But if we do that then a user agent can’t easily distinguish between an arbitrary rdfs:seeAlso link and one that refers to some additional data. We could instead use an additional relationship, a specialisation of rdfs:seeAlso, that can be used to disambiguate between the relationships. I’ve defined just such a predicate: ov:enhancedDescription.
ex:document
foaf:primaryTopic ex:thing.
ex:thing rdfs:label "Some Thing";
rdfs:seeAlso ex:otherDocument;
ov:enhancedDescription ex:premiumDocument.
By using a separate document to hold the value-added annotations we have the opportunity for user agents to identify those documents (via the predicate) and to also be challenged for credentials when they retrieve the URI (e.g. with an HTTP 401 response code).
It also means data publishers can safely dip a toe in the open data waters, but leave richer descriptions protected but still discoverable behind an access control layer.
Another Approach?
Interestingly I discovered earlier today that OpenCalais returns a “402 Payment Required” status code for some documents.
To see this in practice visit their description of IBM and try accessing the last of the owl:sameAs links. I’m guessing they’re using a similar technique to the one I’ve outlined here. But the key difference is that rather than use separate documents, they’ve decided to create new URIs for the access controlled version of the Linked Data. It would be nice if someone out there could confirm that.
Assuming I’ve interpreted what they’re doing correctly, I think this approach has some failings. Firstly it creates extra URIs that aren’t really needed. I’m not sure that we really need more URIs for things; a pattern in which publishers have 2 URIs (public & private) for each resource isn’t going to help matters
Secondly, just like using a generic “see also” relation, using owl:sameAs means its impossible to detect which resource is the one providing access to premium data, and others that exist on the web, without doing some fragile URI matching.
Apologies for the OpenCalais team if I’ve misunderstood the mechanism they’re using. I’ll happily publish a correction, but regardless, I’m intrigued by the 402 status code!
Summary
In my view, the “Enhanced Description” approach is a simple to implement pattern. Its one that I’ve been recommending to people recently but I’ve not seen documented anywhere, so thought I’d write it up.
I’d be interested to hear from others that have either implemented the same mechanism, or like OpenCalais are using other schemes.