Category Archives: REST

Thoughts on the Netflix API Closure

A year ago Netflix announced that they were shuttering their public API: no new API keys or affiliates and no more support. Earlier this week they announced that the entire public API will be shutdown by November 2014.

This is interesting news and its been covered in various places already, including this good overview at Programmable Web. I find it  interesting because its the first time that I can recall an public API being so visibly switched out for a closed, private alternative. Netflix will still offer an API but only for a limited set of eight existing affiliates and (of course) their own applications. Private APIs have always existed and will continue to do so, but the trend to date has been about these being made public, rather than a move in the opposite direction.

It’s reasonable to consider if this might be the first of a new trend, or whether its just an outlier. Netflix have been reasonably forthcoming about their API design decisions so I expect many others will be reflecting on their decision and whether it would make sense for them.

But does it make sense at all?

If you read this article by Daniel Jacobson (Director of Engineering for the Netflix API) you can get more detail on the decision and some insight into their thought process. By closing the public API and focusing on a few affiliates Jacobson suggests that they are able to optimise the API to fit the needs of those specific consumers. The article suggests that a fine-grained resource-oriented API is excellent for supporting largely un-mediated use by a wide range of different consumers with a range of different use cases. In contrast an API that is optimised for fewer use cases and types of query may be able to offer better performance. An API with a smaller surface area will have lower maintenance overheads. Support overheads will also be lower because there’s few interactions to consider and a smaller user base making them.

That rationale is hard to argue with from either a technical or business perspective. If you have a small number of users driving most of your revenue and a long tail of users generating little or no revenue but with a high support code, it mostly makes sense to follow the revenue. I don’t buy all of the technical rationale though. It would be possible to support a mixture of resource types in the API, as well as a mixture of support and service level agreements. So I suspect the business drivers are the main rationale here. APIs have generally meant businesses giving up control, if Netflix are able to make this work then I would be surprised if more business don’t do the same eventually, as a means to regain that control.

But by withdrawing from any kind of public API Netflix are essentially admitting that they don’t see any further innovation happening around their API: what they’ve seen so far is everything they’re going to see. They’re not expecting a sudden new type of usage to drive revenue and users to the service. Or at least not enough to warrant maintaining a more generic API. If they felt that the community was growing, or building new and interesting applications that benefited their business, they’d keep the API open. By restricting it they’re admitting that closer integration with a small number of applications is a better investment. It’s a standard vertical integration move that gives them greater control over all user experience with their platform. It wouldn’t surprise me if they acquired some of these applications in the future.

However it all feels a bit short-sighted to me as they’re essentially withdrawing from the Web. They’re no longer going to be able to benefit from any of the network effects of having their API be a part of the wider web and remixable (within their Terms of Service) with other services and datasets. Innovation will be limited to just those companies they’re choosing to work with through an “experience” driven API. That feels like a bottleneck in the making.

It’s always possible to optimise a business and an API to support a limited set of interactions, but that type of close coupling inevitably results in less flexibility. Personally I’d be backing the Web.

RDF Data Access Options, or Isn’t HTTP already the API?

This is a follow-up to my blog post from yesterday about RDF and JSON. Ed Summers tweeted to say:

…your blog post suggests that an API for linked data is needed; isn’t http already the API?

I couldn’t answer that in 140 characters, so am writing this post to elaborate a little on the last section of my post in which I suggested that “there’s a big data access gulf between de-referencing URIs and performing SPARQL queries”. What exactly do I mean there? And why do I think that the Linked Data API helps?

Is Your Website Your API?

Most Linked Data presentations that discuss the publishing of data to the web typically run through the Linked Data principles. At point three we reach the recommend that:


“When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

This has encourages us to create sites that consist of a mesh of interconnected resources described using RDF. We can “follow our nose” through those relationships to find more information.

This gives us two fundamental two data access options:

  • Resource Lookups: by dereferencing APIs we can obtain a (typically) complete description of a resource
  • Graph Traversal: following relationships and recursively de-referencing URIs to retrieve descriptions of related entities; this is (typically, not not necessarily) reconstituted into a graph on the client

However, if we take the “Your Website Is Your API” idea seriously, then we should be able to reflect all of the different points of interaction of that website as RDF, not just resource lookups (viewing a page) and graph traversal (clicking around).

As Tom Coates noted back in 2006 in “Native to a Web of Data“, good data-driven websites will have “list views and batch manipulation interfaces”. So we should be able to provide RDF views of those areas of functionality too. This gives us another kind of access option:

  • Listing: ability to retrieve lists/collections of things; navigation through those lists, e.g. by paging; and list manipulation, e.g. by filtering or sorting.

It’s possible to handle much of that by building some additional structure into your dataset, e.g. creating RDF Lists (or similar) of useful collections of resources. But if you bake this into your data then those views will potentially need to be re-evaluated every time the data changes. And even then there is still no way for a user to manipulate the views, e.g. to page or sort them.

So to achieve the most flexibility you need a more dynamic way of extracting and ordering portions of the underlying data. This is the role that SPARQL often fulfills, it provides some really useful ways to manipulate RDF graphs, and you can achieve far more with it than just extracting and manipulating lists of things.

SPARQL also supports another kind of access option that would otherwise require traversing some or all of the remote graph.

One example would be: “does this graph contain any foaf:name predicates?” or “does anything in this graph relate to http://www.example.org/bob?”. These kinds of existence checks, as well as more complex graph pattern matching, also tend to be the domain of SPARQL queries. It’s more expressive and potentially more efficient to just use a query language for that kind of question. So this gives us a fourth option:

  • Existence Checks: ability to determine whether a particular structure is present in a graph

Interestingly though they are not often the kinds of questions that you can “ask” of a website. There’s no real correlation with typical web browsing features although searching comes close for simple existence check queries.

Where the Linked Data API fits in

So there are at least four kinds of data access option. I doubt whether its exhaustive, but its a useful starting point for discussion.

SPARQL can handle all of these options and more. The graph pattern matching features, and provision of four query types lets us perform any of these kinds of interaction. For example A common way of implementing Resource Lookups over a triple store is to use a DESCRIBE or a CONSTRUCT query.

However the problem, as I see it, is that when we resort to writing SPARQL graph patterns in order to request, say, a list of people, then we’ve kind of stepped around HTTP. We’re no longer specifying and refining our query by interacting with web resources via parameterised URLs, we’re tunnelling the request for what we want in a SPARQL query sent to an endpoint.

From a hypermedia perspective it would be much better if there were a way to be able to handle the “Listing” access option using something that was better integrated with HTTP. It also happens that this might actually be easier for the majority of web developers to get to grips with, because they no longer have to learn SPARQL.

This is what I meant by a “RESTful API” in yesterday’s blog post. In my mind, “Listing things” sits in between Resource Lookups and Existence Checks or complex pattern matching in terms of access options.

It’s precisely this role that the Linked Data API is intended to fulfil. It defines a way to dynamically generate lists of resources from an underlying RDF graph, along with ways to manipulate those collections of resources, e.g. by sorting and filtering. It’s possible to use it to define a number of useful list views for an RDF dataset that nicely complements the relationships present in the data. It’s actually defined in terms of executing SPARQL queries over that graph, but this isn’t obvious to the end user.

These features are supplemented with the definition of simple XML and JSON formats, to supplement the RDF serializations that it supports. This is really intended to encourage adoption by making it easier to process the data using non RDF tools.

So, Isn’t HTTP the API?

Which brings me to the answer to Ed’s question: isn’t HTTP the API we need? The answer is yes, but we need more than just HTTP, we also need well defined media-types.

Mike Amundsen has created a nice categorisation of media types and a description of different types of factors they contain: H Factor.

Section 5.2.1.2 of Fielding’s dissertation explains that:


Control data defines the purpose of a message between components, such as the action being requested or the meaning of a response. It is also used to parameterize requests and override the default behavior of some connecting elements.

As it stands today neither RDF nor the Linked Data API specification ticks all of the the HFactor boxes. What we’ve really done so far is define how to parameterise some requests, e.g. to filter or sort based on a property value, but we’ve not yet defined that in a standard media type; the API configuration captures a lot of the requisite information but isn’t quite there.

That’s a long rambly blog post for a Friday night! Hopefully I’ve clarified what I was referring to yesterday. I absolutely don’t want to see anyone define an API for RDF that steps around HTTP. We need something that is much more closely aligned with the web. And hopefully I’ve also answered Ed’s question.

Tagged

Explaining REST and Hypertext: Spam-E the Spam Cleaning Robot

I’m going to add to Sam Ruby’s amusement and throw in my attempt to explicate some of Roy Fielding’s recent discussion of what makes an API RESTful. If you’ve not read the post and all the comments then I encourage you to do so: there’s some great tidbits in there that have certainly given me pause for thought.
The following attempts to illustrate my understanding of REST. Perhaps bizarrely, I’ve chosen to focus more on the client than on the design of the server, e.g. what resources it exposes, etc. This is because I don’t think enough focus has been placed on the client, particularly when it comes to the hypermedia constraint. And I think that often, when we focus on how to design an “API”, we’re glossing over some important aspects of the REST architecture which includes after all, other types of actors, including both clients and intermediaries.
I’ve also deliberately chosen not to draw much on existing specifications, again its too easy to muddy the waters with irrelevant details.
Anyway, I’m well prepared to stand corrected on any or all of the below. Will be interested to hear if anyone has any comments.
Lets imagine there are two mime types.
The first is called application/x-wiki-description. It define a JSON format that describes the basic structure of a Wiki website. The format includes a mixture of simple data items, URIs and URI templates that collectively describe:

  • the name of the wiki
  • the email address of the administrator
  • a link to the Recent Changes resource
  • a link to the Main page
  • a link to the license statement
  • a link to the search page (as a URI template, that may include a search term)
  • a link to parameterized RSS feed (as a URI template that may include a date)

Another mime type is application/x-wiki-page-versions. This is another JSON based format that describes the version history of a wiki page. The format is an ordered collection of links. Each resource in that list is a prior version of the wiki page; the most recent page is first in the list.
Spam-E is a little web robot that has been programmed with the smarts to understand several mime types:

  • application/x-wiki-description
  • application/x-wiki-page-versions
  • RSS and Atom
  • XHTML

Spam-E also understands a profile of XHTML that defines two elements: one that points to a resource capable of serving wiki descriptions, another that points to a resource that can return wiki page version descriptions..
Spam-E has internal logic that has been designed to detect SPAM in XHTML pages. It also has a fully functioning HTTP client. And it also has been programmed with logic appropriate to processing those specific media types.
Initially, when starting Spam-E does nothing. It waits to receive a link, e.g. via a simple user interface. Its in a steady state waiting for input.
Spam-E then receives a link. The robot immediates dereferences the link. It does so by submitting a GET request to the URL, and includes an Accept header:

Accept: x-wiki/description;q=1.0, x-wiki/page-versions;q=0.9, application/xhtml+xml;q=0.8, application/atom+xml;q=0.5, application/rss+xml;q=0.4

This clearly states Spam-E’s preference to receive specific mime-types.
In this instance is receives an XHTML document in return. Not ideal, but Spam-E knows how to handle it. After parsing it, it turns out that this is not a specific profile of XHTML that Spam-E understands, so it simply extract all the anchor elements from the file and uses it to widen its search for wiki spam. Another way to say this is that Spam-E has changed its status to one of searching. This state transition has been triggered by following a link, receiving and processing a specific mimetype. This is “hypermedia as the engine of application state” in action.
Spam-E performs this deference-parse-traverse operation several times before finding an XHTML document that conforms to the profile it understands. The document contains a link to a resource that should be capable of serving a wiki description representation.
Spam-E is now in discovery mode. Spam-E uses an Accept header of application/x-wiki-description when following the link and is returned a matching representation. Spam-E parses the JSON and now has additional information at its disposal: it knows how to search the wiki, how to find the RSS feed, how to contact the wiki administrator, etc.
Spam-E now enters Spam Detection mode. It requests, with a suitable Accept header, the recent changes resource, stating a preference for Atom documents. It instead gets an RSS feed, but thats fine because Spam-E still knows how to process that. For each entry in the feed, Spam-E requests the wiki page, using an Accept header of application/xhtml+xml.
Spam-E now tries to find if there is spam on the page by applying its local spam detection logic. In this instance Spam-E discovers some spam on the page. It checks the XHTML document it was returned and discovers that it conforms to a known profile and that embedded in a link element is a reference to the “versions” resource. Spam-E dereferences this link using an Accept header of application/x-wiki-page-versions.
Spam-E, who is now in Spam Cleaning mode, fetches each version in turn and performs spam detection on it. If spam is found, then Spam-E performs a DELETE request on the URI. This will remove that version of the wiki page from the wiki. Someone browsing the original URI of the page will now see an earlier, spam free version.
Once it has finished its cycle of spam detection and cleaning, Spam-E reverts to search mode until it runs out of new URIs.
There are several important points to underline here:
Firstly, at no point did the authors of Spam-E have to have any prior knowledge about the URL structure of any site that the robot might visit. All that Spam-E was programmed with was logic relating to some defined media types (or extension points of a media type in the case of the XHTML profiles) and the basic semantics of HTTP.
Secondly, no one had to publish any service description documents, or define any API end points. No one had to define what operations could be carried out on specific resources, or what response codes would be returned. All information was found by traversing links and by following the semantics of HTTP.
Thirdly, the Spam-E application basically went through a series of state transitions triggered by what media types it received when requesting certain URIs. The application is basically a simple state machine.
Anyway, hopefully that is a useful example. Again, I’m very happy to take feedback. Comments are disabled on this blog, but feel free to drop me a mail (see the Feedback link).

XML Hypertext: Not Dead, Merely Resting?

“The dreams of XML hypertext are dead, or at least thoroughly dormant”

Simon St Laurent’s XML.com article on XQuery is an interesting read. But I think the above statement is worth discussing. Is XML hypertext really dead? Or, if its dormant, is it going to remain so?
Firstly what is XML hypertext? I presume from the context of the quote that Simon is referring to client side use of XML on the web. To me this incorporates several use cases including both the use of XML for presentation (XHTML, SVG, etc) and for data publishing (RSS, Atom, XML based web services). There is an obvious need for linking in both of these use cases.
Where I’d agree with St. Laurent is that most of the existing work here is dormant or duplicated. For example while SVG makes use of XLink, its not used in RSS and Atom, and was deemed not flexible enough for use in XHTML due to issues with attribute naming. However the basic model, labelled links with activation indicators (onLoad, onClick, etc) seems to be shared across vocabularies. But still, XLink has been a Recommendation since 2001 and has yet to set the world on fire.
However where I’d disagree with Simon is that XLink or XML hypertext is thoroughly dormant. Much as I hate to make predictions, I think we’re only just gaining any appreciation of the power of producing good hypertext, because we’re only now seeing the large scale publishing of machine-processable, interrelated data that makes linking worthwhile.
I think growing appreciation of the REST architecture is driving a greater understanding of the benefits of highly linked resources. Sure, we all know its good practice to avoid making web pages that are “dead ends”, but not everyone is publishing data to the same guidelines. The principle of “Hypermedia as the engine of application state” is still not widely understood; it’s a piece of REST Zen that benefits from practical implementation.
Hypertext simplifies client-side development as it avoids spreading the requirement that the client must know how to construct URIs: this reduces coupling. It also simplifies client logic as “the navigation options” (i.e. state transfers) can be presented by the server as the result of previous interactions; the client can simply select from amongst the labelled options. For example if the client needs to access some specific data, e.g. a list of my recently published photos, it can select the appropriate link to retrieve (assuming its available).
That link may be to an entirely different service.
In an XTech 2005 paper I tried to argue (I suspect not very clearly) that linking offers the best route to integration of data from multiple web services. Linking as a means to easier mashing.
If the current data publishing trends continue then I suspect there’s going to be a growing understanding of the benefits of hypertext and this inevitably drive some renewed interest in XLink or a related technology.
What I personally like about RDF in this regard is the “closure” it offers: every resource has a URI, every schema has a URI, every Property and Class has a URI so the data, metadata and schemas can be linked together, and this offers some very powerful capabilities.

Benefits of Refactoring to REST

Edd sent me a pointer to a nice article from Scott Raymond called “Refactoring to REST” in which he outlines how his application code was improved and simplified by adopting a more RESTful design. The application here was built on Rails and used the Simply Restful plugin to nudge Rails into a more RESTful aspect.
I’ve noticed a similar reduction in complexity when moving to RESTful application design. I’ve tended to describe this as reducing the “surface area” of the application: the smaller the surface area, the less code is required. It also follows that the smaller the surface area, the less URL types are required. You end up with fewer fairly standard URL patterns which identify resources, rather than RPC-style “method” oriented URLs.
This has some nice properties. For the client a given server application becomes more easily substitutable, as there’s less coupling. And on the server side it clarifies the “points of contact” of the application with the web (of data). It also makes it easier to maintain permanent links because as the URLs are simpler and more identifiable they’re easier to rewrite/redirect as an application evolves or changes architecture.

QOTD: Fielding on Form(s)

Roy Fielding on why HTML4 forms only support GET and POST:

The only reason the HTML4 spec has only two options available
in that field is *because* of the browser bugs. W3C specs have
no spine.

In an earlier message in the thread he urged folk to help fix the browsers.
Never occured to me to check until now, but XForms supports GET/PUT/POST. But not DELETE oddly.

Microformats and REST

Just noticed Danny’s posting about the new microformat-rest mailing list. I was going to start analysing this but see that Joe Gregorio has already done a good job.
I don’t think that microformats have much to add to REST as an architectural pattern. It certainly doesn’t merit subsetting its use with HTTP; that definitely is overreaching. Championing microformats, and vocabulary re-use in general, is a good thing though, as I’ve talked about before. I think there’s more mileage to be had in pursuing that angle, as well as hypermedia (the other neglected aspect of REST), than there is in subsetting the pattern.
If that doesn’t seem to be a useful approach, then one plea I’d make to the microformats communicate is that they come up with a name for this new architectural style, and actually relate it to REST in a more formal way. Just like Rohit Kare did with ARRESTED.
But I don’t think thats the actual goal. It feels more about trying to define some best practices for deploying RESTful web services. And that’s something I do agree with.

X-DOAP

Danny’s discussion about sending FOAF URLs as HTTP headers reminded me that I’d not yet followed up on some similar proposals I’d made at XTech 2005. In particular, the use of DOAP descriptions instead of “API Keys” for RESTful interfaces.
In my paper after reviewing how services supported authentication and linking of resources, I wrote:

Many of the services support the notion of an “API Key”. These keys are allocated on a per-application basis and are a required parameter in all requests. Typically a form is provided that allows an application developer to quickly obtain a key. Often some context about its intended usage, such as application name, description and a homepage URL must be supplied.

While API keys are not used for authentication they are used as a mechanism to support usage tracking of the API, e.g. to identify active applications and potentially spot abuses. From this perspective they are a useful feature that furnishes service providers with useful usage statistics about client applications…

Later, I critiqued the use of hypermedia to link together different resources exposed via several RESTful interfaces, noting that very few actually used this technique instead relying on the client to construct additional URLs in order to extract more data. One item that frequently needs to be added is an API Key which:

…prohibits free publishing of links, as given URL is only suitable for use by a single application, the one to which the key was assigned.

It is the use of API keys that is the most troublesome. While obviously providing a useful feature, API keys hamper loose ad hoc integration; clients must know how to manipulate a URL to insert an API key. Therefore, while a service may provide unauthenticated use of read-only URLs, these links cannot be openly published without also sharing an API key. This obviously undermines their potential benefits.

An alternative to using API keys in the URL, is to require applications to identify themselves using an existing HTTP feature: the User-Agent header. This header can be used to pass an application name, URL, or other token to a service without requiring modification of the request URL. An API key is actually request metadata, and HTTP headers are the correct place for this metadata to be communicated.

Some APIs already support or encourage use of User-Agent, notably del.icio.us and WebJay. However the technique isn’t that suitable for all environments, e.g. a bookmarklet, where one has no control over the HTTP headers.
User-Agent is also problematic due to its unstructured format: the field is basically free text and browser User-Agent‘s are already hopelessly muddled.
In my presentation I suggested using an alternate HTTP header X-DOAP whose value would be the URI of the DOAP description of the client application. This header would supplant the use of API Keys, or at least encouraged as an alternate mechanism for identifying a client application. To my mind this provides the same level of detail and usage tracking as an API key, but in a more flexible manner.
It’s worth noting that Greasemonkey (and other AJAX environments I assume) allow the addition of custom HTTP headers to outgoing requests. So one can use both X-DOAP and Danny’s (X-)FOAF headers to identify both the client application and the user. As far as I can see it’s only bookmarklets that are limited to not having access to either the User-Agent settings or other outgoing headers. I’m not certain that a lot of API accesses come from those environments anyway, I’d hazard that custom applications and AJAX clients are increasingly the norm.
X-DOAP could be used now, assuming consensus could be reached amongst the various service providers. As Danny has noted an official registration would carry a lot more weight.

Service Description Mailing List

If you’re interested in web service descriptions, and in particular RESTful service descriptions you should get yourself over to public-web-http-desc, a new W3C mailing list dedicated to precisely that topic.
From his introduction, Philippe Le Hegaret described the list as being …dedicated to discussion of Web description languages based on URI/IRI and HTTP, and aligned with the Web and REST Architecture. Unlike WSDL (Web Services Description Language), such languages are not targeted towards description of Web Services..
Le Hegaret’s posting includes some introductory pointers that round up a lot of the recent proposals in this space, including those from Bray, Cowan, Baker, Orchard, etc. This thread contains some other useful background.
The initial topic of discussion is the scope of the problem at hand, specifically: are we discussing a description language for XML services or any web service, regardless of representation formats? My vote is with the more inclusive option.
Definitely a space to watch if you’re interested in REST services.

Connecting Social Content Services using FOAF, RDF and REST

Abstract

A growing number of “social content” applications such as Flickr, del.icio.us, audioscrobbler, and AllConsuming are making open web services part of their core offering to end users. These interfaces allow users to query, share, and manipulate the data managed on their behalf by these social content applications.

Web service interfaces make such sites more attractive to end users by removing the danger of data lock-in, while simultaneously providing the users with tools that allow them to gain the most value from their data. This translates into direct benefits for the service itself as the end users extend the reach and visibility of the application by publishing the content on their own websites and ‘blogs.

Based on a brief review of the common features of a selection of these sites, this paper suggests some best practice guidelines that developers can follow when creating new service interfaces for similar applications.

Recognising that it is the end user that is the pivotal component in the success of such applications, these best practices will be used to propose a simple mechanism for connecting together social content based sites with the aim of provide richer, autonomous data exchange.

This architecture will focus on the use of FOAF descriptions of users as a service intermediary, and RESTful web services, exchanging RDF and RSS data, as the means of data exchange.

Introduction

Web services have undeniably been the hot topic over the past 2-3 years, with many analysts predicting a revolution in the way that businesses will connect with one another over the internet. Yet the Web Services movement has so far generated little more than hype and a great number of specifications: there have been very few, if any, deployments of internet scale applications using Web Services technologies.

Experience so far, as recorded by both Amazon WS-Amazon and Flickr WS-Flickrsuggests that when offered a choice, developers prefer simpler a simpler approach. In particular services that conform to an architectural style known as REST(Representational State Transfer) FieldingREST services are more tightly bound to existing web architecture, i.e. HTTP and XML, meaning that developers have much less to learn to become productive with a given service or API, which in turn leads to more rapid development of client applications.

While the early adoption of web services by a number of large scale service providers such as EBay, Amazon and Google, generated a large amount of activity and interest in the developer community, much of the current excitement centres not on traditional business applications but on “social content” services that promote sharing of information between web users. These services allow users to share their interests, tastes, and creative output with other users. The addition of REST services to these sites has increased their success, allowing the hacker community to build additional tools and services around this open content.

The mixing of community generated content with open APIs seems primed to deliver many of the potential benefits of web services. The web service vision itself is a part of a larger goal of creating a machine-processable web.

Successes in this area will be compounded if the developer community regularly takes stock of which technologies and techniques have proved beneficial, to capitalise on hard-won experience. This paper aims to help foster this sharing of experience by in several ways.

Firstly, by highlighting the benefits of releasing open web services (see ), the author aims to encourage other service developers to extend the range of their applications, and particularly social content services.

Secondly, by defining what constitutes a social content service (see ) and selecting a number of sites that fit this definition, the author highlights the common practices as well as implementation differences in the deployment of REST services across these applications (see ).

Thirdly, by summarising these features, the author aims to suggest a number of best practices that may help developers as they design their own services. See .

Lastly, in an attempt to foster additional discussion the paper concludes with a by introducing a simple architecture, based on REST and RDF, that facilitates simpler integration of social content services with the goal of creating features and applications that span service boundaries. See .

Benefits of Web Services

This paper uses a deliberately simple definition of a web service:

A machine interface onto services and data provided by a web application

This definition is technology agnostic, and clearly distinguishes between the interfaces provided to human and machine users of a service.

While any web application arguably offers a machine interface the trade-offs required to create a good information architecture suitable for human consumption mean that these interfaces are not sufficiently fine-grained, both in terms of the services they offer and data they expose to make them suitable for subsequent processing by other applications. While many hackers have produced useful tools that rely on screen scraping techniquesScreenScraping, these are too fragile to maintain for any length of time.

But why should a service provider consider offering open access to their content and services and, more crucially, spend engineering budgets on developing, maintaining and enhancing these additional service entry-points? The following sections attempt to answer this question.

Second Order Effects

Sites such as Amazon and EBay have already clearly demonstrated the benefits of providing a web service interface alongside one designed specifically for humans. They’ve shown that web services can be successful, and further that such services need not be directly charged for as discrete products. Instead each of these sites has benefited by taking benefit of the second order effects (increased traffic, sales, etc) that follow from having third-parties build applications against a shared data set.

The reach of these applications has been extended far beyond that which can be achieved by a simple HTML application; beyond the reach of a single organisation which operates within the bounds of limited operating costs; and pushed into directions far beyond those their initial developers may have envisaged. Essentially open web services allow the hacker community to become part of the services engineering team, in much the same way as open sourcing an application; an important consideration for many small start-up companies.

Savvy Web Users

Web users are becoming increasingly more savvy, and they have begun to recognise that their ongoing contributions in whatever form, e.g. product reviews, are an important success factor for these businesses. In short, users are coming to recognise that they are providing data for free, so why should it be locked into that site alone?

Users are perhaps most concerned about the potential loss of their investment. The controversy that arose after the change in licencing of the CDDB database by gracenote (CDDB), locking down all the freely contributed user submissions under a commercial licence, is one oft-cited example of this happening in practice.

In other regards, user concerns are largely pragmatic: there have been a spate of launches of services that offer very similar or overlapping functionality: e.g. managing a social network for fun or profit (Orkut Orkut, Friendster Friendster, Ecademy Ecademy, etc). For users there is a large initial cost involved in experimenting with each site (entering and building up a social network) and a large switching cost should they wish to move to another service (re-entering all their data, re-invite friends, etc).

Both of these concerns can be mitigated by offering users the ability to freely import and export their data from a service.

After Open Source, Open Data

The growing desire for web users to maintain ownership of their data, plus the increasing willingness for businesses to share their data to benefit from the network effects that web services can engender can be seen as a natural second wave of “open-ness”. First open source; now open data

This second wave has the potential to be a much bigger and profound movement as it is of immediate relevance to all frequent web users, rather than just software engineers, hackers, businesses, etc.

The pressure for information to be free, is translating into pressure for sites to expose open services for users to interact with. Similar pressures are leading to standardisation of data formats; web services alone aren’t enough the data must be portable and easily exchangeable. XML and RDF are both core technologies that facilitate this exchange, with specific vocabularies, such as FOAF (description of users) and RSS (syndication of content) addressing particular needs.

Introducing Social Content Services

This section of the paper introduces a number of social content services and their service interfaces. The aim is to help establish a community of practice among developers through contrasting the design approaches of each site. In the following sections each web site is briefly introduced (see ), then compared to others based on a number of criteria (see ).

Identifying applications to include in the review requires a definition of what constitutes a “social content” service. From there we can select the relevant criteria for comparing RESTful APIs.

Social Content Services Defined

Many of the interesting experiments with open web services are happening on web sites that offer the ability for users to share their experiences, thoughts, interests and creative output. These sites are the next generation of online communities and their growth has been fostered by the rise in popularity of weblogs and the attendant improvements to the ease of publishing and sharing data on the web.

The majority of such sites are generally classed under the banner of “social networking” or even groupware: they connect people together into communities of interest, fostering “networks of shared experience” Udell. However, sites such as Flickr, and MusicBrainz are not focussed on connecting people, but on sharing data. In some cases the community aspects are secondary and, in the case of Flickr, emergent characteristics of the service.

This paper therefore attempts to more closely defines a “social content” application as:

A service whose primary goal is to allow users to contribute and freely share data, a secondary goal of such services is to enable users to connect themselves to communities of interest either by direction participation, or indirectly as a side effect of data sharing.

Several high-profile web sites and services, including Ebay, Amazon, Google and Yahoo are thereby excluded from this comparison review, even though in some cases (e.g. Amazon reviews and rating, EBay auctions) there are significant user contributed components. Weblogs, as typically single user web sites are similarly excluded. While a wiki, such as the Wikipedia arguably meets this criteria — users freely publish articles, the wiki itself is a community of interest — few Wikis support web services beyond basic RSS feeds of recently changed content; there is however interest in enhancing Wiki publishing tools through use of the Atom API AtomWiki.

Existing Social Content Applications

There are a number of existing web sites that conform to the above definition; this section introduces a selection of these sites.

43Things

43Things 43Things provides users with the facility of managing an list of personal goals. The aim is to help users achieve clarity on their objectives and prioritise them accordingly. Users can attach up to 43 (hence the name) individual goals to their profile and provide status updates, on their progress towards achieving that objective. Once a user has completed an objective they can rate it, to say whether it was worth doing after all.

Naturally the site also allows a user to discover others who are pursuing the same or similar goals. Users may also find people from the same town and view the goals common amongst people in their geographical area. The aim here is to help people achieve their objectives by connecting them to both local and virtual communities.

Goals can also be tagged with user provided keywords to classify them into higher level groupings. Browsing based on tags provides the means for users to find similar goals, or new objectives they’d potentially like to achieve.

The 43Things API currently provides access to all the features that are currently available through the web interface.

Audioscrobbler and last.FM

Audioscrobbler Audioscrobbler and its sister site last.FM Last.FM aim to build up a profile of a user’s musical taste either manually, by selecting individual artists and songs, or automatically through the use of media player plugins that track every song that the user plays. This record of listening behaviour, along with metadata about each track (e.g. title, recording artist, etc) is transmitted to the web site via a web service interface.

For most users interacts with the site is therefore a background process that steadily accumulating a list of favourite artists, tracks, etc based on their listening habits over a period of time.

Aggregating this data across many users allows the audioscrobbler service to match people based on similarity of their music tastes; the last.FM service is then able to produce a custom “radio station” Last.FMTutorial, consisting of media player playlists that contain tracks recommended by other users with similar tastes.

The web services allowing submission of track data are open for public use, encouraging the creation of new media player plugins. The “now playing” data, i.e. the last few tracks that a user has listened to, can be syndicated and shared in several ways. Other APIs provide control over the radio stations available from last.FM allowing the user to automatically tune into particular “stations”.

WebJay

Not surprisingly there are a number of other music related social content sites. Most closely related to last.FM is WebJay WebJay. This service allows users to create and share playlists containing links to freely available music. Users can manually create playlists, or automatically generate them based on a bookmarklet and serviceWebJayAPI that is capable of dynamically generating a playlist by scanning a web page for links to MP3 files.

WebJay also allows users to browse and search playlists. Aggregation of usage data provides a simple ranking of popular playlists. Links to similar playlists are also provided based on overlaps in their contents. The WebJay API also allows users to write tools to automate the creation and maintenance of playlists.

MusicBrainz

The final music content site included in this review is MusicBrainz MusicBrainz, a community maintained music database. MusicBrainz allows users to submit metadata about artists, albums and individual tracks to the database, building up a dataset that includes basic track metadata including title, recording artist, running times, related artists, web links, etc. As all metadata is user contributed, a moderation system is used to promote quality.

Users can freely browse the MusicBrainz data on the website, but can also use the web services MusicBrainzClient to programmatically access the database. This main use of this services is via the MusicBrainz Tagger MusicBrainzTagger, a desktop tool that is capable of generating the audio fingerprints that MusicBrainz uses to uniquely identify tracks. The Tagger therefore allows users to properly annotate their own music collection against the community generated database. Additional metadata, e.g. for unknown tracks, can also be submitted via the Tagger.

MusicBrainz is therefore very similar to CDDB, except that all data is explicitly placed into the public domain using a Creative Commons licence CC.

del.icio.us

del.icio.us del.icio.us is a community bookmark manager that allows users to freely publish and share web links, annotating them with comments, and classifying them with keywords so that they can be easily organized. Keywords, or “tags” can also be further organized into categories.

While these features provide useful functionality to individual users, especially those desiring portability of bookmarks between browsers and machines, the most interesting aspect of del.icio.us is the ability to browse using the tags alone. This is resulting in an emergent classification, or “folksonomy” for the web, enabling users to discover related web pages. The ability to perform a “reverse lookup” on an individual URL, i.e. discover which tags have been applied, completes the picture.

Almost all of the del.icio.us functionality and data is exposed using web services, ranging from RSS feeds through to an API delAPI that allows submission, maintenance, and retrieval of individual bookmarks.

Flickr

Spurred by the success of the del.icio.us tagging system, many other sites have begun adding similar tagging features. One such is Flickr Flickr, a hugely popular social content application specialising in the publication and sharing of digital photographs.

Flickr provides a suite of tools FlickrTools for making the publishing, manipulation and classification of photos as easy as possible. Images can be submitted in many ways including desktop tools, email submission, and browser bookmarklets; the flickr server takes care of resizing of images, automatically producing several scaled images that users can link to (e.g. from a blog), or share with others. EXIF EXIF metadata (e.g. camera model, timestamps, etc) embedded in digital photographs is also automatically extracted and presented to users. Combined with user submitted metadata in the form of titles, tags, and comments this constitutes a very rich environment within which users organize their photo collection.

Users may also build up a social network, indicating that other flickr users are “contacts” or “friends”. Photos can then be shared publically, or published only for viewing by friends. Users may also explicitly tag their photos using Creative Commons licences to encourage republication. Users can also join groups of common interest; this feature has resulted in the creation of some interesting community art projects such as the “squared circle” collection SquaredCircle.

The flickr web service FlickrAPI exposes all of the flickr web site and photo uploading functionality as a REST service, encouraging the creation of new tools and applications.

Upcoming

The final social content site to be included in the review is Upcoming Upcoming, the first in an emerging wave of sites that facilitate sharing of data concerning events, e.g. gigs, film screenings, plays, etc. Users can sign up to the site, and supplement their profile by submitting lists of events that they are planning to attend. By browsing based on location and assigned tags, users can find events of interest, add them to their calendar, and find other users who are also attending.

The site has recently been extended to include an API UpcomingAPI that exposes this basic functionality.

The Review Criteria

Before conducting the review we must first define the criteria against which the services will be compared. As this paper is primarily concerned with REST services, the definition of the REST architectural style will be used as an initial starting point. The REST style is formally defined in Fielding, Section 5.1.5:

REST is defined by four interface constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.

A less theoretical and more practically oriented introduction to designing a REST protocol is available in RestfulWeb, which suggests that defining a REST service involves answering the following questions:

  • What are the URIs?
  • What’s the format?
  • What methods are supported at each URI?
  • What status codes could be returned?

Within the context of a social content service other issues should also be considered: security, privacy, and the licencing of both the web service and contributed data.

This list of definitions and issues leads to the following series of criteria that can be used to compare the services under review:

  • How does the API use URLs to identify resources?
  • What HTTP methods does the API support?
  • What status codes are returned?
  • How are users and applications authenticated to the API?
  • Are resources linked to one another (use of hypermedia)?
  • What data formats does the API support?
  • How much user data is exposed?
  • What are the licencing agreements for the API and the data?

A REST Service Review

The following sections summarise the results of contrastring the REST services exposed by the services introduced in based on the criteria defined in the previous section (see ).

Use of URLs

How do the services use URLs to identify resources?

Service Notes
43Things Key resources such as person, goal, city have available via “get resource by id” URLs, these identify individual resources using identifiers in the query string of the url. E.g. /service/get_person?id=ldodds

Where supported, updates to resources are handled via separate methods, e.g. “add goal”, “update goal”.

AudioScrobbler/Last.FM Several different URLs used to expose “recent playing” data, derived from user identifiers.

Upload protocol uses a single controller URL to which all data is posted; users are identified via query string parameters

The last.fm radio control protocol uses separate urls to retrieve data, skip tracks, retune the station, etc

WebJay Each user and playlist has a unique URL; playlists URLs are sub-ordinate to the users URL, e.g. /user/playlist-name. Updates to playlists are dispatched to the playlist URL.
MusicBrainz Each artist, album and track has a unique URL.
del.icio.us No unique URLs for resources, each method is a function call, e.g. “recent bookmarks” that may be filtered by tag or ranged by date.
flickr Key resources such as person, photographs, groups have are available via “get info” methods, e.g./services/rest/?method=flickr.people.getInfo&user=ldodds; these identify individual resource using query string identifiers.

Where supported, updates to resources are handled via the same controller URL, but using different method parameters

Upcoming Similar to flickr, each key resource such as event, person, etc is available via “get info” methods.

Use of HTTP Methods

What HTTP Methods does the service support?

Service Supported Methods
43Things GET, POST
AudioScrobbler/Last.FM GET, POST
WebJay DELETE, GET, POST
MusicBrainz GET, POST
del.icio.us GET, POST
flickr GET, POST
Upcoming GET, POST

Use of Status Codes

What HTTP status codes are documented as being returned by the service? (seeStatusCodes for definitions)

Service Codes
43Things 200, 500
AudioScrobbler/Last.FM 200, 503, no others documented
WebJay 200, 201, 4xx, 5xx
MusicBrainz 200, 404, 500
del.icio.us 200, 403, 503
flickr 200
Upcoming 200, 404, 403, 409, 500

Use of Authentication

How does the API authenticate users, and must clients identify themselves via an API Key?

Service Authentication Methods Read-only methods open? API Key Required?
43Things Username/password as plain text in parameters, or Atom X-WSSE AtomAuth Yes Yes
AudioScrobbler/Last.FM Username and md5-hashed password Yes Yes, “client id” for submissions
WebJay HTTP Basic Auth Yes No, but documentation encourages use of appropriate User-Agent headers
MusicBrainz Username to acquire session for submitting data Yes No
del.icio.us HTTP Basic Auth No No, but documentation encourages use of appropriate User-Agent headers
flickr Username/password as plain text in parameters Mostly Yes
Upcoming Username/password as plain text in parameters Yes Yes

Use of Hypermedia

Are resources exposed by the service linked to one another in API responses?

Service Notes
43Things Generally no, although entries are linked to the appropriate Atom protocol URLs
AudioScrobbler/Last.FM No
WebJay No
MusicBrainz Yes
del.icio.us No
flickr No
Upcoming No

Data Formats

What data formats does the API support?

Service Notes
43Things Custom XML format, some use of XML namespaces; RSS
AudioScrobbler/Last.FM Plain text, FOAF, RSS
WebJay Plain text, XSPF XSPF
MusicBrainz Custom RDF vocabulary
del.icio.us Attribute oriented custom XML format, no use of XML namespaces, RSS
flickr Attributed oriented custom XML format, no use of XML namespaces; RSS
Upcoming Attributed oriented custom XML format, RSS

Exposure of Personal Data

Service Notes
43Things Basic description including name, username, home page/blog URL, profile image of person
AudioScrobbler/Last.FM Basic description including name, home page/blog URL, profile image of person, and list of friends (in FOAF output)
WebJay None. Although service captures name, email, homepage URL, profile image
MusicBrainz None. Although service captures name, email, homepage URL, and a brief biography
del.icio.us None currently exposed, although service captures name and email address
flickr Basic description including name, username, home page/blog URL, profile image of person, and list of contacts and group membership
Upcoming None currently exposed, although service captures name, email address, homepage/blog url, photo and list of friends

Licensing

Licencing of Service and Data

Service Service Data
43Things None specified, although commercial users are asked to announce their intentions None specified
AudioScrobbler/Last.FM None specified Creative Commons, Non-commercial, Share-alike, Attribution Licence
WebJay Free for open source software, non-commercial otherwise None specified
MusicBrainz OpenContentlicence OpenContent licence
del.icio.us None specified None specified
flickr Non commercial Non specified, although users can set an individual preference that allows the association of a CreativeCommons licence with their photos.
Upcoming Non commercial Non specified

Some Best Practice Recommendations

Summarising the findings presented in , the following sections outline a number of best practice recommendations for use as a reference when designing new REST services.

Use of URIs

The first item to notice from the review is that few of these services actually support a true REST interface. Many of them are “RESTful” in that they make correct use of HTTP methods, error codes, etc. but the majority fall short when it comes to correctly identifying resources using unique URLs.

In most cases the exposed services are actually closer to the Remote Procedure Call architectural style than REST. E.g. use of a single controller URL for all service methods, with parameters identifying the action to take and the resource to act upon. Each API method is then essentially a function call that is processed by the application. In addition many of the applications expose alternate URLs for different query methods by identifier, name, email address, etc. rather than having a generic query resource that acts differently depending on the parameters provided.

In a REST service each resource would have a unique URL. Query methods are supported by adding appropriate query strings to these URLs. This design principle results in a smaller and more elegant API; multiple RPC style methods can be collapsed into a single URL that responds appropriately depending on the HTTP method through which it is requested, plus the request parameters.

Clearly assigning identifiers to significant resources exposed by the API also promotes easier linking between resources and services.

Use of HTTP methods

The majority of the reviewed services respond appropriately to the different HTTP methods, using GET to support reads and POST to perform updates. These two methods are the absolute minimum that a RESTful service should support. Use of the DELETE and PUT methods is often deferred as these are not well-supported within browsers. In the reviewed applications only the WebJay API supports the DELETE method.

Applications should not allow GET requests to make modifications to data. GET requests must be “idempotent”, i.e. be without side-effects; this is a basic axiom of the webAxioms. One of the dangers of RPC based API design is that developersfocus instead on the request URL (e.g. update_person) and not the combination of URL + HTTP method. Loop holes such as allowing updates via GET may then get propogated into live services. For example both the Flickr and 43Things API have update methods that operate on both POST and GET requests.

Use of Status Codes

The services vary greatly in their use of HTTP status codes; in some cases the status codes are wrongly used. For example Flickr appears to respond only with a “200 OK” status code even when the request is in error. Instead the body of the response contains an error code, e.g. authenticate errors that should be signified using an “403 Not Authorised” response.

The services vary in how they indicate failed requests, such as missing data, missing required parameters, invalid data submissions, etc. Where these are indicated they are often flagged with a generic “500 Server Error” code. Missing data should always return a “404 Not Found“. The UpComing API correctly uses “400 Invalid Request” to identify missing parameters.

HTTP status codes StatusCodes have been defined for all significant issues that a distributed application is likely to encounter. Correct use of status codes enabled clear labelling of errors and efficient dispatching to client side error handlers without the need to process the response body. Intermediaries, e.g. caching proxies, and clients such as web crawlers, can also use this information to generically handle a response without having to deal with the specifics of a given API.

Use of Authentication

Authentication support among the different APIs is varied. The majority allow unauthenticated use of their read-only methods but require a client to be authenticated to use methods that update server-side state. del.ici.ous is the exception in that it requires authentication for any use of the API.

In all of the services, authorisation is granted by the user registering with the service, which furnishes them with the required credentials, generally just a username and password.

The supported authentication methods range from clear text transmission of credentials in query strings, through to use of HTTP Basic Authentication and Atom X-WSSEAtomAuth which uses HTTP extensions to plug in a new authentication model.

The basic model of open read-only operations with security constraints on the remainder of the API should be encouraged as it enables free use of data.

Many of the services support the notion of an “API Key”. These keys are allocated on a per-application basis and are a required parameter in all requests. Typically a form is provided that allows an application developer to quickly obtain a key. Often some context about its intended usage, such as application name, description and a homepage URL must be supplied.

While API keys are not used for authentication they are used as a mechanism to support usage tracking of the API, e.g. to identify active applications and potentially spot abuses. From this perspective they are a useful feature that furnishes service providers with useful usage statistics about client applications. However, as will be demonstrated in the next section, the use of API keys severely restricts ad hoc integration of services, and care should be taken over how they are deployed.

Use of Hypermedia

Through the use of URLs as public identifiers, REST services can be linked together to create a hypertext of data. This promotes integration as in many circumstances the link is all that is required to locate additional data. This theme is expanded on in the final section of this paper (see .

Responses from REST services should contain URLs that identify all resources that are referenced in the response. For example a description of an album may contain URLs that refer to the artists description, human-readable documentation, descriptions of which users reviewed the album, etc.

Embedding URLs into the response also facilitates integration as clients can simply extract required URLs for subsequent processing (e.g. to GET additional data). This can be achieved without explicit knowledge of the URL structure of the remote application. Service providers therefore also benefit from greater freedom in restructuring their site and services without the need to co-ordinate code changes between client and server.

Few of the reviewed APIs support this feature; clients are expected to make further requests by constructing URLs based on identifiers included in service responses. MusicBrainz use resource linking as a side-effect of its use of RDF; 43Things only provides links to HTML documentation for some resources, but does provide the URL for manipulating user postings via the Atom API.

Lack of linking can be attributed to three factors. Firstly, discussion of the REST style have, to date, been largely centred on correct use of HTTP rather than the additional benefits that acrue from use of hypermedia. Secondly, the RPC style that the majority of the services follow, promotes a view of the API as a series of method calls, rather than endpoints within an hypertext of data. Thirdly, the use of API keys prohibits free publishing of links, as given URL is only suitable for use by a single application, the one to which the key was assigned.

It is the use of API keys that is the most troublesome. While obviously providing a useful feature, API keys hamper loose ad hoc integration; clients must now how to manipulate a URL to insert an API key. Therefore, while a service may provide unauthenticated use of read-only URLs, these links cannot be openly published without also sharing an API key. This obviously undermines their potential benefits.

An alternative to using API keys in the URL, is to require applications to identify themselves using an existing HTTP feature: the User-Agent header. This header can be used to pass an application name, URL, or other token to a service without requiring modification of the request URL. An API key is actually request metadata, and HTTP headers are the correct place for this metadata to be communicated.

Both the WebJay and del.icio.us API documentation encourages the proper use of User-Agent headers, suggesting the inclusion of a URL, e.g. to a project homepage, as a means to provide useful documentation on the client application.

The one environment where this technique isn’t suitable is for Javascript bookmarklets and other scripting environments where the script has no control over the request header.

Data Formats

Not surprisingly the majority of services use XML as the lingua franca of data exchange. Several, such as WebJay and AudioScrobbler also use simple text protocols, but these should be avoided or, at the very least, designed to support Unicode to avoid internationalisation issues. XML parsing overhead is unlikely to become a significant bottleneck in request processing, and responses can be generated using a simple templating engine.

MusicBrainz is unique in its use of RDF RDFPrimer, a technology that is well-suited for use within a REST service; as RDF uses URIs as the means of identifying resources, the API URL structure and the response format can be closely related.

RDF is most suited for use when aggregating and manipulating data from multiple sources, its abstract model makes it ease to combine data sets from different locations. Inferencing also helps integration of multiple data sources by allowing developers to “late bind” data to a given application schema. Designers of social content services should consider RDF as at least an export format, if not the basis of the services implementation data model. Use of templating systems means adding additional output formats, e.g. to support both an RDF and an XML view, is trivial. With some care XML vocabularies can be designed so that they can be correctly interpreted as RDF, allowing a single format to support all possible uses.

None of the reviewed services uses of standard vocabularies in their XML, each typically has its own custom XML vocabulary. There are many opportunities here to standardise on vocabularies for common resources such as people, groups, and places. This should encourage increased sharing of data between many different applications, not only social content services, but also geographical catalogues, social networking sites, etc. Avoidance of proprietary formats makes integration simpler and allows standard tools (component models, stylesheets, etc) to be created and shared for re-use across sites.

The last section of this paper expands on this opportunity, and in particular the use of RDF vocabularies, to suggest an architecture for loose integration between social content applications.

Exposure of Personal Data

Of the social content sites that do expose a public view of a users profile, none share any unsecured information such as email addresses. In most cases the data consists of name, homepage or blog URL, and often location (e.g. city, state, country). This is enough to provide some basic context about the user and relate their profile to submissions to the site.

Several of the sites also support a “groups” feature allowing users to aggregate themselves into communities of interest. Group listings, include the name of the group and its members are often added to the API.

As noted above this kind of basic personal description data could benefit from some standardisation across sites (see ).

Licensing

There are few common points of agreement between licensing of services and data across the reviewed applications. However the majority do indicate that the API is free for non-commercial usage. Anything else would hinder the network effects expected from exposing services.

However not all of the services explicitly licence their data. MusicBrainz uses anOpenContent licence for its database, while Flickr allows users to associate a Creative Commons licence with images in their collection.

Service designers are strongly encouraged to clarify the licencing of all of data, and to use an appropriate Creative Commons licence, ideally one that places some or all of the data into the public domain. Recognising that individual users may want alternative licencing details associated with their data, either more or less restrictive than that of the service itself, designers should consider allowing users to select an appropriate Creative Commons licence that can be stored referenced from their data.

As more web services appear and begin to be composed to create other applications, clear licencing of both services and data will become increasingly important.

Connecting Social Content Services

This final section of the paper introduces some simple techniques that, exploiting the power of the REST architecture, the RDF data model, and common RDF vocabularies, can be used as the basis for loose integration and data sharing between social content services.

This architecture is based on several principles that build upon one another:

  • A common RDF vocabulary for describing people, their activities and interests, i.e. FOAF
  • Support for importing FOAF data into social content applications
  • Use of user’s self-descriptions as a form of “service connector” enabling ad hocintegration of social content applications

The following sections describe each of these principles in more detail.

A Vocabulary for Personal Descriptions

As the review notes demonstrate (see ) those the majority of services share some common properties in their user profile data, Similarly many of the sites provide access to descriptions of sub-communities, i.e. “groups”. The advantages of moving to a standard vocabulary for this data, were introduced in .

The FOAF (“Friend of a Friend”) project has a very similar aim in mind: the definition of an RDF vocabulary for expressing metadata about people, their interests, relationships and activities FOAFIntro. While the name betrays its origin as a format for describing social networks, it’s real utility is as a general framework for connecting together more specialised vocabularies.

For example FOAF provides basic facilities for describing an Image, its creator and even the people depicted in the photograph. But FOAF makes no attempt to completely model all of the relevant information that might be attached to an Image. It is expected that other specialised vocabularies will be used to supplement the core FOAF properties. The basic framework of the FOAF vocabulary are people, groups, organizations, and images. This more than meets the requirements of the social content applications reviewed here, while still granting each application the ability to extend and supplement the vocabulary as necessary.

For a fuller introduction to FOAF see FOAFIntro and the FOAF specification FOAF.

Of the reviewed applications only one, Audioscrobbler, currently supports a FOAF export of its user profile data.

Importing and Harvesting FOAF and RDF

As the number of people creating and hosting their own FOAF profile continues to increase, and as more applications start to support the exporting of FOAF profiles based on user registration data, the natural next step is to consider importing this data into applications.

The first and most immediate benefit will be on the user registration process. Much of the work of required in filling out an registration form can be automated: simply allow a user to point to their current FOAF profile (whether hand-crafted or exported from another service) and populate the form accordingly.

However, as well as reading basic user metadata, applications may also mine the provided FOAF profile for interests, e.g. to allow recommendation of relevant groups or features, and also to import additional details such as relationship data (friends, colleagues, etc). Not only does initial registration become easier, but the service can become indispensable to the user much quicker by tailoring itself to the data contained (or harvestable via) a users self-description.

In fact a service may go so far as to avoid creating and managing user data locally, instead storing only a URL of the users FOAF profile, and simply reading and caching data as it is required. This provides users with much more control over their data, making it easier to update multiple services when information changes.

There are obvious security issues to consider, but it is interesting how the standardisation of something as simple as a user description can begin to open up alternative application architectures.

Self-Description as Service Connectors

Once applications are routinely importing FOAF data provided by end users, a number of additional opportunities then present themselves.

For example, FOAF provides terms that allow a user to state that they have an account on a particular website, in effect tieing their self-description to a specific user identifier on that service. The following RDF fragment illustrates this by stating that the author has an account on del.icious

<foaf:Person>
	<foaf:holdsAccount>
		<foaf:OnlineAccount>
			<foaf:accountName>ldodds</foaf:accountName>
			<foaf:accountServiceHomepage rdf:resource="http://del.icio.us"/>
		</foaf:OnlineAccount>
	</foaf:holdsAccount>
</foaf:Person>

Users of this data could use this identifier to begin initiating calls to the del.icio.us web service, e.g. to discover the authors recent bookmarks.

A direct example will illustrate the benefits: WebJay might process FOAF data to find an Audioscrobbler profile owned by an individual user and from there, via the Audioscrobbler, discover the user’s top-ten artists. Recommendation of suitable playlists or related artists based on data already held in the WebJay system can then used to fine-tune it recommendations to a user without significant development effort, nor lengthy customisation by the end-user.

However this style of integration mechanism is limited in one important regard: the need for application developers to develop specific code to discover data of interest. To continue the example, the WebJay developers need to write code to interface with the Audioscrobbler web service. If similar websites are subsequently launched, then new code must be written for each new integration. This is a cost-intensive solution and one that doesn’t scale well as the number of points of integration rises.

A more flexible means of integration would be to exploit RDF hyperlinking Hyperlinking, i.e. the rdfs:seeAlso property which relates one resource to another that provides a further description. If Audioscrobbler and similar services allowed direct linking to an RDF resource that described a users top-ten artists, then a user could introduce the following into their FOAF document:

<foaf:Person>
	<eg:favouriteArtists>
		<eg:TopTenArtists rdf:resource="http://example.com/music/ldodds/top-end"/>
	</eg:favouriteArtists>
</foaf:Person>

The primary data set contained in the destination resource is indicated by assigning anrdf:type to the resource, in this case eg:TopTenArtists. This avoids the need for an application to traverse all rdfs:seeAlso links discovered in a FOAF document, by narrowing down the crawling of data to just those documents that contain data of interest.

This simple shift to the use of RDF, immediately addresses the above integration problem. No specific application code has to be written to knit together the services: a simple HTTP GET is sufficient to read in the data from any application that exposes its data using a REST service API returning RDF metadata. Integration becomes simple traversal of links. And the development focus can then move to vocabulary creation, and the exploration of the available data.

Of course not all integration issues are immediately dealt with by this approach. The technique offers little scope for fine-tuning of queries between applications, and neither does it allow one service to more easily update another (e.g. via a POST). But it does address the most common case of data exchange, and in particular data exchanges that centre on people and related data.

Summary

The REST architectural style is being embraced by social content application developers as a means to quickly and simply share data with the hacker community. The network effects of data sharing, and the freedom for users to repurpose content, benefits both the service provider and their end users

Reviewing some deployed service interfaces shows the majority fall short of implementing the complete REST architecture. The paper compared these interface using several criteria with the aim of recommending some best practices in service design.

The benefits of using FOAF, RDF and REST in concert were also discussed, highlighting the ease with which web service interfaces can be composed to implement new application features. These benefits are subject to similar network effects: as more applications expose FOAF and RDF data sets, the more opportunities there are for creating innovative new features by combining data from multiple sources.

Bibliography

[43Things] 43Things.
[43ThingsAPI] 43Things Web Service API.
[AtomWiki] An Atom-Powered Wiki, Joe Gregorio, 14th April 2004
[AtomAuth] Atom Authentication, Mark Pilgrim, 17th December 2003
[Audioscrobbler] Audioscrobbler
[Axioms] Universal Resource Identifiers — Axioms of Web Architecture, Identity, State and GET, Tim Berners-Lee, 19th December 1996
[CC] Creative Commons.
[CDDB] CDDB, Wikipedia Article
[del.icio.us] del.icio.us
[delAPI] del.icio.us API
[Ecademy] Ecademy
[EXIF] Exchangeable image file format, Wikipedia Article
[Fielding] Architectural Styles and the Design of Network-based Software Architectures, Roy Fielding, 2000
[Flickr] Flickr.
[FlickrAPI] Flickr API Documentation
[FlickrTools] Flickr Uploading Tools
[FOAF] FOAF Vocabulary Specification, Dan Brickley and Libby Miller, 4th April 2005
[FOAFIntro] An Introduction to FOAF, Leigh Dodds, 4th February 2004
[Friendster] Friendster
[Hyperlinking] RDF Hyper-linking, Dan Brickley, 2003.
[Last.FM] Last.FM
[Last.FMTutorial] Last.FM Tutorial.
[MusicBrainz] MusicBrainz
[MusicBrainzClient] MusicBrainz Client Library HOWTO
[MusicBrainzTagger] About the MusicBrainz Tagger
[Orkut] Orkut
[RDFPrimer] RDF Primer, W3C Recommendation 10 February 2004
[RestfulWeb] How to Create a REST Protocol, Joe Gregorio, 1st December 2004
[ScreenScraping] Screen Scraping, Wikipedia Article
[SquaredCircle] Squared Circle Flickr Group
[StatusCodes] HTTP/1.1: Status Codes, Fielding et al, June 1999
[Udell] Networks of Shared Experience, John Udell, 14th April 2004
[Upcoming] Upcoming.org.
[UpcomingAPI] Upcoming.org API Documentation – Version 1.0
[WebJay] WebJay.
[WebJayAPI] Webjay API, Lucas Gonze, v0.0 (Draft June 2, 2004)
[WS-Amazon] Making Web Services Work at Amazon, Edd Dumbill, 9th December 2003
[WS-Flickr] Stewart Butterfield on Flickr, Richard Koman, 2nd April 2005
[XSPF] XML Shareable Playlist Format
Follow

Get every new post delivered to your Inbox.

Join 29 other followers