I hate quoting myself, as I worry about it making me seem like a pompous ass, but I feel moved to do it in this instance after reading Danny’s posting about DataPortability Service Discovery, in which he discusses the current blueprint from the DataPortability group.
Danny rightly points out that FOAF already provides a means for listing all of the accounts that a person uses as part of their online activity. The vocabulary allows the service to be identified along with their account username. This is typically sufficient information to start interacting with a service API to extract useful information about the user. E.g. for importing into another site.
Here’s the example that I included in an XTech paper I presented in 2005:
<foaf:Person>
<foaf:holdsAccount>
<foaf:OnlineAccount>
<foaf:accountName>ldodds</foaf:accountName>
<foaf:accountServiceHomepage
rdf:resource="http://del.icio.us"/>
</foaf:OnlineAccount>
</foaf:holdsAccount>
</foaf:Person>
With that bit of information you can easily get access to my del.icio.us bookmarks, for example. The limitation in this kind of approach, whether its implemented using FOAF, or using the protocol outlined in the DataPortability blueprint, is that a third-party service wanting to extract data about the user needs some prior knowledge of the service it will be interacting with: it need knowledge of the API (i.e. a client) and also what kind of information it holds about the user (i.e. does it contain relevant data)?
And in my opinion this doesn’t scale. For truly distributed, ad hoc service integration, I think you need a slightly different approach to the problem. And in my opinion to achieve this means embracing a more RESTful approach, and one that ideally takes advantage of the flexibility of RDF.
Rather than simply providing a list of services, I should point to the data. Towards the end of my paper (see the section “Self-Description as Service Connectors”) I suggested that use of rdfs:seeAlso
to create RDF hyperlinks between documents and appropriately typing the linked resources will bring two advantages. Firstly it avoids the need to trawl through unnecessary services in order to get at the data that’s of interest, the user can explicitly point to it. Secondly there’s no need for API specific clients beyond the need for an HTTP GET request.
Here’s the example in the paper rewritten to address a particular DataPortability use case: “Aggregate your, and your friend’s, “Status” (eg Twitter) from all the “Status” systems you belong to.”
Firstly “my friends” can be those people listed in my FOAF document. FOAF provides the basic data substrate for glueing the services together. Secondly, I point to a web resource from which my “Status” message(s) can be retrieved:
<foaf:Person>
<eg:statuses>
<eg:Status
rdf:resource="http://twitter.com/statuses/user_timeline/14813.atom"/>
</eg:statuses>
</foaf:Person>
So a third-party service that needs to find my current Status simply identifies the relevant resource and then takes that URI and does an HTTP GET on it.
Then lets say I decide to move from Twitter and use some other service. Here’s what happens:
<foaf:Person>
<eg:statuses>
<eg:Status
rdf:resource="http://example.com/status/ldodds"/>
</eg:statuses>
</foaf:Person>
See, what I did? And guess what that Status aggregator has to do: Nothing.
In my opinion this rightly shifts the emphasis away from the details of individual service APIs and encourages standardization on data formats. Surely this has to be the most important aspect to Data Portability? For example it will encourage sites that produce Status messages to agree on how these will be published onto the web, whether that involves explicit standardization or simple adoption of a standard like Atom.
As I’ve written before, RDF does have some nice properties for enabling data integration and allowing for independent evolution of community specific vocabularies which are worth exploring in this context.
I really don’t see the need for intermediary services at all to create this kind of connection beyond services that allow for maintenance of a FOAF profile. The other nice property of this form of interaction is that I don’t need to use any services. If I decide to manage my own online presence, manage my own OpenID, and publish all my public data as a collection of hand-crafted static data files on my own server, then that’s fine: its all just URIs.
If we want true ownership of our own data, and true portability, then the means of integration needs to support this at the most fundamental level.
Pompous ass mode off.