Connecting Social Content Services using FOAF, RDF and REST

Abstract

A growing number of “social content” applications such as Flickr, del.icio.us, audioscrobbler, and AllConsuming are making open web services part of their core offering to end users. These interfaces allow users to query, share, and manipulate the data managed on their behalf by these social content applications.

Web service interfaces make such sites more attractive to end users by removing the danger of data lock-in, while simultaneously providing the users with tools that allow them to gain the most value from their data. This translates into direct benefits for the service itself as the end users extend the reach and visibility of the application by publishing the content on their own websites and ‘blogs.

Based on a brief review of the common features of a selection of these sites, this paper suggests some best practice guidelines that developers can follow when creating new service interfaces for similar applications.

Recognising that it is the end user that is the pivotal component in the success of such applications, these best practices will be used to propose a simple mechanism for connecting together social content based sites with the aim of provide richer, autonomous data exchange.

This architecture will focus on the use of FOAF descriptions of users as a service intermediary, and RESTful web services, exchanging RDF and RSS data, as the means of data exchange.

Introduction

Web services have undeniably been the hot topic over the past 2-3 years, with many analysts predicting a revolution in the way that businesses will connect with one another over the internet. Yet the Web Services movement has so far generated little more than hype and a great number of specifications: there have been very few, if any, deployments of internet scale applications using Web Services technologies.

Experience so far, as recorded by both Amazon WS-Amazon and Flickr WS-Flickrsuggests that when offered a choice, developers prefer simpler a simpler approach. In particular services that conform to an architectural style known as REST(Representational State Transfer) FieldingREST services are more tightly bound to existing web architecture, i.e. HTTP and XML, meaning that developers have much less to learn to become productive with a given service or API, which in turn leads to more rapid development of client applications.

While the early adoption of web services by a number of large scale service providers such as EBay, Amazon and Google, generated a large amount of activity and interest in the developer community, much of the current excitement centres not on traditional business applications but on “social content” services that promote sharing of information between web users. These services allow users to share their interests, tastes, and creative output with other users. The addition of REST services to these sites has increased their success, allowing the hacker community to build additional tools and services around this open content.

The mixing of community generated content with open APIs seems primed to deliver many of the potential benefits of web services. The web service vision itself is a part of a larger goal of creating a machine-processable web.

Successes in this area will be compounded if the developer community regularly takes stock of which technologies and techniques have proved beneficial, to capitalise on hard-won experience. This paper aims to help foster this sharing of experience by in several ways.

Firstly, by highlighting the benefits of releasing open web services (see ), the author aims to encourage other service developers to extend the range of their applications, and particularly social content services.

Secondly, by defining what constitutes a social content service (see ) and selecting a number of sites that fit this definition, the author highlights the common practices as well as implementation differences in the deployment of REST services across these applications (see ).

Thirdly, by summarising these features, the author aims to suggest a number of best practices that may help developers as they design their own services. See .

Lastly, in an attempt to foster additional discussion the paper concludes with a by introducing a simple architecture, based on REST and RDF, that facilitates simpler integration of social content services with the goal of creating features and applications that span service boundaries. See .

Benefits of Web Services

This paper uses a deliberately simple definition of a web service:

A machine interface onto services and data provided by a web application

This definition is technology agnostic, and clearly distinguishes between the interfaces provided to human and machine users of a service.

While any web application arguably offers a machine interface the trade-offs required to create a good information architecture suitable for human consumption mean that these interfaces are not sufficiently fine-grained, both in terms of the services they offer and data they expose to make them suitable for subsequent processing by other applications. While many hackers have produced useful tools that rely on screen scraping techniquesScreenScraping, these are too fragile to maintain for any length of time.

But why should a service provider consider offering open access to their content and services and, more crucially, spend engineering budgets on developing, maintaining and enhancing these additional service entry-points? The following sections attempt to answer this question.

Second Order Effects

Sites such as Amazon and EBay have already clearly demonstrated the benefits of providing a web service interface alongside one designed specifically for humans. They’ve shown that web services can be successful, and further that such services need not be directly charged for as discrete products. Instead each of these sites has benefited by taking benefit of the second order effects (increased traffic, sales, etc) that follow from having third-parties build applications against a shared data set.

The reach of these applications has been extended far beyond that which can be achieved by a simple HTML application; beyond the reach of a single organisation which operates within the bounds of limited operating costs; and pushed into directions far beyond those their initial developers may have envisaged. Essentially open web services allow the hacker community to become part of the services engineering team, in much the same way as open sourcing an application; an important consideration for many small start-up companies.

Savvy Web Users

Web users are becoming increasingly more savvy, and they have begun to recognise that their ongoing contributions in whatever form, e.g. product reviews, are an important success factor for these businesses. In short, users are coming to recognise that they are providing data for free, so why should it be locked into that site alone?

Users are perhaps most concerned about the potential loss of their investment. The controversy that arose after the change in licencing of the CDDB database by gracenote (CDDB), locking down all the freely contributed user submissions under a commercial licence, is one oft-cited example of this happening in practice.

In other regards, user concerns are largely pragmatic: there have been a spate of launches of services that offer very similar or overlapping functionality: e.g. managing a social network for fun or profit (Orkut Orkut, Friendster Friendster, Ecademy Ecademy, etc). For users there is a large initial cost involved in experimenting with each site (entering and building up a social network) and a large switching cost should they wish to move to another service (re-entering all their data, re-invite friends, etc).

Both of these concerns can be mitigated by offering users the ability to freely import and export their data from a service.

After Open Source, Open Data

The growing desire for web users to maintain ownership of their data, plus the increasing willingness for businesses to share their data to benefit from the network effects that web services can engender can be seen as a natural second wave of “open-ness”. First open source; now open data

This second wave has the potential to be a much bigger and profound movement as it is of immediate relevance to all frequent web users, rather than just software engineers, hackers, businesses, etc.

The pressure for information to be free, is translating into pressure for sites to expose open services for users to interact with. Similar pressures are leading to standardisation of data formats; web services alone aren’t enough the data must be portable and easily exchangeable. XML and RDF are both core technologies that facilitate this exchange, with specific vocabularies, such as FOAF (description of users) and RSS (syndication of content) addressing particular needs.

Introducing Social Content Services

This section of the paper introduces a number of social content services and their service interfaces. The aim is to help establish a community of practice among developers through contrasting the design approaches of each site. In the following sections each web site is briefly introduced (see ), then compared to others based on a number of criteria (see ).

Identifying applications to include in the review requires a definition of what constitutes a “social content” service. From there we can select the relevant criteria for comparing RESTful APIs.

Social Content Services Defined

Many of the interesting experiments with open web services are happening on web sites that offer the ability for users to share their experiences, thoughts, interests and creative output. These sites are the next generation of online communities and their growth has been fostered by the rise in popularity of weblogs and the attendant improvements to the ease of publishing and sharing data on the web.

The majority of such sites are generally classed under the banner of “social networking” or even groupware: they connect people together into communities of interest, fostering “networks of shared experience” Udell. However, sites such as Flickr, and MusicBrainz are not focussed on connecting people, but on sharing data. In some cases the community aspects are secondary and, in the case of Flickr, emergent characteristics of the service.

This paper therefore attempts to more closely defines a “social content” application as:

A service whose primary goal is to allow users to contribute and freely share data, a secondary goal of such services is to enable users to connect themselves to communities of interest either by direction participation, or indirectly as a side effect of data sharing.

Several high-profile web sites and services, including Ebay, Amazon, Google and Yahoo are thereby excluded from this comparison review, even though in some cases (e.g. Amazon reviews and rating, EBay auctions) there are significant user contributed components. Weblogs, as typically single user web sites are similarly excluded. While a wiki, such as the Wikipedia arguably meets this criteria — users freely publish articles, the wiki itself is a community of interest — few Wikis support web services beyond basic RSS feeds of recently changed content; there is however interest in enhancing Wiki publishing tools through use of the Atom API AtomWiki.

Existing Social Content Applications

There are a number of existing web sites that conform to the above definition; this section introduces a selection of these sites.

43Things

43Things 43Things provides users with the facility of managing an list of personal goals. The aim is to help users achieve clarity on their objectives and prioritise them accordingly. Users can attach up to 43 (hence the name) individual goals to their profile and provide status updates, on their progress towards achieving that objective. Once a user has completed an objective they can rate it, to say whether it was worth doing after all.

Naturally the site also allows a user to discover others who are pursuing the same or similar goals. Users may also find people from the same town and view the goals common amongst people in their geographical area. The aim here is to help people achieve their objectives by connecting them to both local and virtual communities.

Goals can also be tagged with user provided keywords to classify them into higher level groupings. Browsing based on tags provides the means for users to find similar goals, or new objectives they’d potentially like to achieve.

The 43Things API currently provides access to all the features that are currently available through the web interface.

Audioscrobbler and last.FM

Audioscrobbler Audioscrobbler and its sister site last.FM Last.FM aim to build up a profile of a user’s musical taste either manually, by selecting individual artists and songs, or automatically through the use of media player plugins that track every song that the user plays. This record of listening behaviour, along with metadata about each track (e.g. title, recording artist, etc) is transmitted to the web site via a web service interface.

For most users interacts with the site is therefore a background process that steadily accumulating a list of favourite artists, tracks, etc based on their listening habits over a period of time.

Aggregating this data across many users allows the audioscrobbler service to match people based on similarity of their music tastes; the last.FM service is then able to produce a custom “radio station” Last.FMTutorial, consisting of media player playlists that contain tracks recommended by other users with similar tastes.

The web services allowing submission of track data are open for public use, encouraging the creation of new media player plugins. The “now playing” data, i.e. the last few tracks that a user has listened to, can be syndicated and shared in several ways. Other APIs provide control over the radio stations available from last.FM allowing the user to automatically tune into particular “stations”.

WebJay

Not surprisingly there are a number of other music related social content sites. Most closely related to last.FM is WebJay WebJay. This service allows users to create and share playlists containing links to freely available music. Users can manually create playlists, or automatically generate them based on a bookmarklet and serviceWebJayAPI that is capable of dynamically generating a playlist by scanning a web page for links to MP3 files.

WebJay also allows users to browse and search playlists. Aggregation of usage data provides a simple ranking of popular playlists. Links to similar playlists are also provided based on overlaps in their contents. The WebJay API also allows users to write tools to automate the creation and maintenance of playlists.

MusicBrainz

The final music content site included in this review is MusicBrainz MusicBrainz, a community maintained music database. MusicBrainz allows users to submit metadata about artists, albums and individual tracks to the database, building up a dataset that includes basic track metadata including title, recording artist, running times, related artists, web links, etc. As all metadata is user contributed, a moderation system is used to promote quality.

Users can freely browse the MusicBrainz data on the website, but can also use the web services MusicBrainzClient to programmatically access the database. This main use of this services is via the MusicBrainz Tagger MusicBrainzTagger, a desktop tool that is capable of generating the audio fingerprints that MusicBrainz uses to uniquely identify tracks. The Tagger therefore allows users to properly annotate their own music collection against the community generated database. Additional metadata, e.g. for unknown tracks, can also be submitted via the Tagger.

MusicBrainz is therefore very similar to CDDB, except that all data is explicitly placed into the public domain using a Creative Commons licence CC.

del.icio.us

del.icio.us del.icio.us is a community bookmark manager that allows users to freely publish and share web links, annotating them with comments, and classifying them with keywords so that they can be easily organized. Keywords, or “tags” can also be further organized into categories.

While these features provide useful functionality to individual users, especially those desiring portability of bookmarks between browsers and machines, the most interesting aspect of del.icio.us is the ability to browse using the tags alone. This is resulting in an emergent classification, or “folksonomy” for the web, enabling users to discover related web pages. The ability to perform a “reverse lookup” on an individual URL, i.e. discover which tags have been applied, completes the picture.

Almost all of the del.icio.us functionality and data is exposed using web services, ranging from RSS feeds through to an API delAPI that allows submission, maintenance, and retrieval of individual bookmarks.

Flickr

Spurred by the success of the del.icio.us tagging system, many other sites have begun adding similar tagging features. One such is Flickr Flickr, a hugely popular social content application specialising in the publication and sharing of digital photographs.

Flickr provides a suite of tools FlickrTools for making the publishing, manipulation and classification of photos as easy as possible. Images can be submitted in many ways including desktop tools, email submission, and browser bookmarklets; the flickr server takes care of resizing of images, automatically producing several scaled images that users can link to (e.g. from a blog), or share with others. EXIF EXIF metadata (e.g. camera model, timestamps, etc) embedded in digital photographs is also automatically extracted and presented to users. Combined with user submitted metadata in the form of titles, tags, and comments this constitutes a very rich environment within which users organize their photo collection.

Users may also build up a social network, indicating that other flickr users are “contacts” or “friends”. Photos can then be shared publically, or published only for viewing by friends. Users may also explicitly tag their photos using Creative Commons licences to encourage republication. Users can also join groups of common interest; this feature has resulted in the creation of some interesting community art projects such as the “squared circle” collection SquaredCircle.

The flickr web service FlickrAPI exposes all of the flickr web site and photo uploading functionality as a REST service, encouraging the creation of new tools and applications.

Upcoming

The final social content site to be included in the review is Upcoming Upcoming, the first in an emerging wave of sites that facilitate sharing of data concerning events, e.g. gigs, film screenings, plays, etc. Users can sign up to the site, and supplement their profile by submitting lists of events that they are planning to attend. By browsing based on location and assigned tags, users can find events of interest, add them to their calendar, and find other users who are also attending.

The site has recently been extended to include an API UpcomingAPI that exposes this basic functionality.

The Review Criteria

Before conducting the review we must first define the criteria against which the services will be compared. As this paper is primarily concerned with REST services, the definition of the REST architectural style will be used as an initial starting point. The REST style is formally defined in Fielding, Section 5.1.5:

REST is defined by four interface constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.

A less theoretical and more practically oriented introduction to designing a REST protocol is available in RestfulWeb, which suggests that defining a REST service involves answering the following questions:

  • What are the URIs?
  • What’s the format?
  • What methods are supported at each URI?
  • What status codes could be returned?

Within the context of a social content service other issues should also be considered: security, privacy, and the licencing of both the web service and contributed data.

This list of definitions and issues leads to the following series of criteria that can be used to compare the services under review:

  • How does the API use URLs to identify resources?
  • What HTTP methods does the API support?
  • What status codes are returned?
  • How are users and applications authenticated to the API?
  • Are resources linked to one another (use of hypermedia)?
  • What data formats does the API support?
  • How much user data is exposed?
  • What are the licencing agreements for the API and the data?

A REST Service Review

The following sections summarise the results of contrastring the REST services exposed by the services introduced in based on the criteria defined in the previous section (see ).

Use of URLs

How do the services use URLs to identify resources?

Service Notes
43Things Key resources such as person, goal, city have available via “get resource by id” URLs, these identify individual resources using identifiers in the query string of the url. E.g. /service/get_person?id=ldodds

Where supported, updates to resources are handled via separate methods, e.g. “add goal”, “update goal”.

AudioScrobbler/Last.FM Several different URLs used to expose “recent playing” data, derived from user identifiers.

Upload protocol uses a single controller URL to which all data is posted; users are identified via query string parameters

The last.fm radio control protocol uses separate urls to retrieve data, skip tracks, retune the station, etc

WebJay Each user and playlist has a unique URL; playlists URLs are sub-ordinate to the users URL, e.g. /user/playlist-name. Updates to playlists are dispatched to the playlist URL.
MusicBrainz Each artist, album and track has a unique URL.
del.icio.us No unique URLs for resources, each method is a function call, e.g. “recent bookmarks” that may be filtered by tag or ranged by date.
flickr Key resources such as person, photographs, groups have are available via “get info” methods, e.g./services/rest/?method=flickr.people.getInfo&user=ldodds; these identify individual resource using query string identifiers.

Where supported, updates to resources are handled via the same controller URL, but using different method parameters

Upcoming Similar to flickr, each key resource such as event, person, etc is available via “get info” methods.

Use of HTTP Methods

What HTTP Methods does the service support?

Service Supported Methods
43Things GET, POST
AudioScrobbler/Last.FM GET, POST
WebJay DELETE, GET, POST
MusicBrainz GET, POST
del.icio.us GET, POST
flickr GET, POST
Upcoming GET, POST

Use of Status Codes

What HTTP status codes are documented as being returned by the service? (seeStatusCodes for definitions)

Service Codes
43Things 200, 500
AudioScrobbler/Last.FM 200, 503, no others documented
WebJay 200, 201, 4xx, 5xx
MusicBrainz 200, 404, 500
del.icio.us 200, 403, 503
flickr 200
Upcoming 200, 404, 403, 409, 500

Use of Authentication

How does the API authenticate users, and must clients identify themselves via an API Key?

Service Authentication Methods Read-only methods open? API Key Required?
43Things Username/password as plain text in parameters, or Atom X-WSSE AtomAuth Yes Yes
AudioScrobbler/Last.FM Username and md5-hashed password Yes Yes, “client id” for submissions
WebJay HTTP Basic Auth Yes No, but documentation encourages use of appropriate User-Agent headers
MusicBrainz Username to acquire session for submitting data Yes No
del.icio.us HTTP Basic Auth No No, but documentation encourages use of appropriate User-Agent headers
flickr Username/password as plain text in parameters Mostly Yes
Upcoming Username/password as plain text in parameters Yes Yes

Use of Hypermedia

Are resources exposed by the service linked to one another in API responses?

Service Notes
43Things Generally no, although entries are linked to the appropriate Atom protocol URLs
AudioScrobbler/Last.FM No
WebJay No
MusicBrainz Yes
del.icio.us No
flickr No
Upcoming No

Data Formats

What data formats does the API support?

Service Notes
43Things Custom XML format, some use of XML namespaces; RSS
AudioScrobbler/Last.FM Plain text, FOAF, RSS
WebJay Plain text, XSPF XSPF
MusicBrainz Custom RDF vocabulary
del.icio.us Attribute oriented custom XML format, no use of XML namespaces, RSS
flickr Attributed oriented custom XML format, no use of XML namespaces; RSS
Upcoming Attributed oriented custom XML format, RSS

Exposure of Personal Data

Service Notes
43Things Basic description including name, username, home page/blog URL, profile image of person
AudioScrobbler/Last.FM Basic description including name, home page/blog URL, profile image of person, and list of friends (in FOAF output)
WebJay None. Although service captures name, email, homepage URL, profile image
MusicBrainz None. Although service captures name, email, homepage URL, and a brief biography
del.icio.us None currently exposed, although service captures name and email address
flickr Basic description including name, username, home page/blog URL, profile image of person, and list of contacts and group membership
Upcoming None currently exposed, although service captures name, email address, homepage/blog url, photo and list of friends

Licensing

Licencing of Service and Data

Service Service Data
43Things None specified, although commercial users are asked to announce their intentions None specified
AudioScrobbler/Last.FM None specified Creative Commons, Non-commercial, Share-alike, Attribution Licence
WebJay Free for open source software, non-commercial otherwise None specified
MusicBrainz OpenContentlicence OpenContent licence
del.icio.us None specified None specified
flickr Non commercial Non specified, although users can set an individual preference that allows the association of a CreativeCommons licence with their photos.
Upcoming Non commercial Non specified

Some Best Practice Recommendations

Summarising the findings presented in , the following sections outline a number of best practice recommendations for use as a reference when designing new REST services.

Use of URIs

The first item to notice from the review is that few of these services actually support a true REST interface. Many of them are “RESTful” in that they make correct use of HTTP methods, error codes, etc. but the majority fall short when it comes to correctly identifying resources using unique URLs.

In most cases the exposed services are actually closer to the Remote Procedure Call architectural style than REST. E.g. use of a single controller URL for all service methods, with parameters identifying the action to take and the resource to act upon. Each API method is then essentially a function call that is processed by the application. In addition many of the applications expose alternate URLs for different query methods by identifier, name, email address, etc. rather than having a generic query resource that acts differently depending on the parameters provided.

In a REST service each resource would have a unique URL. Query methods are supported by adding appropriate query strings to these URLs. This design principle results in a smaller and more elegant API; multiple RPC style methods can be collapsed into a single URL that responds appropriately depending on the HTTP method through which it is requested, plus the request parameters.

Clearly assigning identifiers to significant resources exposed by the API also promotes easier linking between resources and services.

Use of HTTP methods

The majority of the reviewed services respond appropriately to the different HTTP methods, using GET to support reads and POST to perform updates. These two methods are the absolute minimum that a RESTful service should support. Use of the DELETE and PUT methods is often deferred as these are not well-supported within browsers. In the reviewed applications only the WebJay API supports the DELETE method.

Applications should not allow GET requests to make modifications to data. GET requests must be “idempotent”, i.e. be without side-effects; this is a basic axiom of the webAxioms. One of the dangers of RPC based API design is that developersfocus instead on the request URL (e.g. update_person) and not the combination of URL + HTTP method. Loop holes such as allowing updates via GET may then get propogated into live services. For example both the Flickr and 43Things API have update methods that operate on both POST and GET requests.

Use of Status Codes

The services vary greatly in their use of HTTP status codes; in some cases the status codes are wrongly used. For example Flickr appears to respond only with a “200 OK” status code even when the request is in error. Instead the body of the response contains an error code, e.g. authenticate errors that should be signified using an “403 Not Authorised” response.

The services vary in how they indicate failed requests, such as missing data, missing required parameters, invalid data submissions, etc. Where these are indicated they are often flagged with a generic “500 Server Error” code. Missing data should always return a “404 Not Found“. The UpComing API correctly uses “400 Invalid Request” to identify missing parameters.

HTTP status codes StatusCodes have been defined for all significant issues that a distributed application is likely to encounter. Correct use of status codes enabled clear labelling of errors and efficient dispatching to client side error handlers without the need to process the response body. Intermediaries, e.g. caching proxies, and clients such as web crawlers, can also use this information to generically handle a response without having to deal with the specifics of a given API.

Use of Authentication

Authentication support among the different APIs is varied. The majority allow unauthenticated use of their read-only methods but require a client to be authenticated to use methods that update server-side state. del.ici.ous is the exception in that it requires authentication for any use of the API.

In all of the services, authorisation is granted by the user registering with the service, which furnishes them with the required credentials, generally just a username and password.

The supported authentication methods range from clear text transmission of credentials in query strings, through to use of HTTP Basic Authentication and Atom X-WSSEAtomAuth which uses HTTP extensions to plug in a new authentication model.

The basic model of open read-only operations with security constraints on the remainder of the API should be encouraged as it enables free use of data.

Many of the services support the notion of an “API Key”. These keys are allocated on a per-application basis and are a required parameter in all requests. Typically a form is provided that allows an application developer to quickly obtain a key. Often some context about its intended usage, such as application name, description and a homepage URL must be supplied.

While API keys are not used for authentication they are used as a mechanism to support usage tracking of the API, e.g. to identify active applications and potentially spot abuses. From this perspective they are a useful feature that furnishes service providers with useful usage statistics about client applications. However, as will be demonstrated in the next section, the use of API keys severely restricts ad hoc integration of services, and care should be taken over how they are deployed.

Use of Hypermedia

Through the use of URLs as public identifiers, REST services can be linked together to create a hypertext of data. This promotes integration as in many circumstances the link is all that is required to locate additional data. This theme is expanded on in the final section of this paper (see .

Responses from REST services should contain URLs that identify all resources that are referenced in the response. For example a description of an album may contain URLs that refer to the artists description, human-readable documentation, descriptions of which users reviewed the album, etc.

Embedding URLs into the response also facilitates integration as clients can simply extract required URLs for subsequent processing (e.g. to GET additional data). This can be achieved without explicit knowledge of the URL structure of the remote application. Service providers therefore also benefit from greater freedom in restructuring their site and services without the need to co-ordinate code changes between client and server.

Few of the reviewed APIs support this feature; clients are expected to make further requests by constructing URLs based on identifiers included in service responses. MusicBrainz use resource linking as a side-effect of its use of RDF; 43Things only provides links to HTML documentation for some resources, but does provide the URL for manipulating user postings via the Atom API.

Lack of linking can be attributed to three factors. Firstly, discussion of the REST style have, to date, been largely centred on correct use of HTTP rather than the additional benefits that acrue from use of hypermedia. Secondly, the RPC style that the majority of the services follow, promotes a view of the API as a series of method calls, rather than endpoints within an hypertext of data. Thirdly, the use of API keys prohibits free publishing of links, as given URL is only suitable for use by a single application, the one to which the key was assigned.

It is the use of API keys that is the most troublesome. While obviously providing a useful feature, API keys hamper loose ad hoc integration; clients must now how to manipulate a URL to insert an API key. Therefore, while a service may provide unauthenticated use of read-only URLs, these links cannot be openly published without also sharing an API key. This obviously undermines their potential benefits.

An alternative to using API keys in the URL, is to require applications to identify themselves using an existing HTTP feature: the User-Agent header. This header can be used to pass an application name, URL, or other token to a service without requiring modification of the request URL. An API key is actually request metadata, and HTTP headers are the correct place for this metadata to be communicated.

Both the WebJay and del.icio.us API documentation encourages the proper use of User-Agent headers, suggesting the inclusion of a URL, e.g. to a project homepage, as a means to provide useful documentation on the client application.

The one environment where this technique isn’t suitable is for Javascript bookmarklets and other scripting environments where the script has no control over the request header.

Data Formats

Not surprisingly the majority of services use XML as the lingua franca of data exchange. Several, such as WebJay and AudioScrobbler also use simple text protocols, but these should be avoided or, at the very least, designed to support Unicode to avoid internationalisation issues. XML parsing overhead is unlikely to become a significant bottleneck in request processing, and responses can be generated using a simple templating engine.

MusicBrainz is unique in its use of RDF RDFPrimer, a technology that is well-suited for use within a REST service; as RDF uses URIs as the means of identifying resources, the API URL structure and the response format can be closely related.

RDF is most suited for use when aggregating and manipulating data from multiple sources, its abstract model makes it ease to combine data sets from different locations. Inferencing also helps integration of multiple data sources by allowing developers to “late bind” data to a given application schema. Designers of social content services should consider RDF as at least an export format, if not the basis of the services implementation data model. Use of templating systems means adding additional output formats, e.g. to support both an RDF and an XML view, is trivial. With some care XML vocabularies can be designed so that they can be correctly interpreted as RDF, allowing a single format to support all possible uses.

None of the reviewed services uses of standard vocabularies in their XML, each typically has its own custom XML vocabulary. There are many opportunities here to standardise on vocabularies for common resources such as people, groups, and places. This should encourage increased sharing of data between many different applications, not only social content services, but also geographical catalogues, social networking sites, etc. Avoidance of proprietary formats makes integration simpler and allows standard tools (component models, stylesheets, etc) to be created and shared for re-use across sites.

The last section of this paper expands on this opportunity, and in particular the use of RDF vocabularies, to suggest an architecture for loose integration between social content applications.

Exposure of Personal Data

Of the social content sites that do expose a public view of a users profile, none share any unsecured information such as email addresses. In most cases the data consists of name, homepage or blog URL, and often location (e.g. city, state, country). This is enough to provide some basic context about the user and relate their profile to submissions to the site.

Several of the sites also support a “groups” feature allowing users to aggregate themselves into communities of interest. Group listings, include the name of the group and its members are often added to the API.

As noted above this kind of basic personal description data could benefit from some standardisation across sites (see ).

Licensing

There are few common points of agreement between licensing of services and data across the reviewed applications. However the majority do indicate that the API is free for non-commercial usage. Anything else would hinder the network effects expected from exposing services.

However not all of the services explicitly licence their data. MusicBrainz uses anOpenContent licence for its database, while Flickr allows users to associate a Creative Commons licence with images in their collection.

Service designers are strongly encouraged to clarify the licencing of all of data, and to use an appropriate Creative Commons licence, ideally one that places some or all of the data into the public domain. Recognising that individual users may want alternative licencing details associated with their data, either more or less restrictive than that of the service itself, designers should consider allowing users to select an appropriate Creative Commons licence that can be stored referenced from their data.

As more web services appear and begin to be composed to create other applications, clear licencing of both services and data will become increasingly important.

Connecting Social Content Services

This final section of the paper introduces some simple techniques that, exploiting the power of the REST architecture, the RDF data model, and common RDF vocabularies, can be used as the basis for loose integration and data sharing between social content services.

This architecture is based on several principles that build upon one another:

  • A common RDF vocabulary for describing people, their activities and interests, i.e. FOAF
  • Support for importing FOAF data into social content applications
  • Use of user’s self-descriptions as a form of “service connector” enabling ad hocintegration of social content applications

The following sections describe each of these principles in more detail.

A Vocabulary for Personal Descriptions

As the review notes demonstrate (see ) those the majority of services share some common properties in their user profile data, Similarly many of the sites provide access to descriptions of sub-communities, i.e. “groups”. The advantages of moving to a standard vocabulary for this data, were introduced in .

The FOAF (“Friend of a Friend”) project has a very similar aim in mind: the definition of an RDF vocabulary for expressing metadata about people, their interests, relationships and activities FOAFIntro. While the name betrays its origin as a format for describing social networks, it’s real utility is as a general framework for connecting together more specialised vocabularies.

For example FOAF provides basic facilities for describing an Image, its creator and even the people depicted in the photograph. But FOAF makes no attempt to completely model all of the relevant information that might be attached to an Image. It is expected that other specialised vocabularies will be used to supplement the core FOAF properties. The basic framework of the FOAF vocabulary are people, groups, organizations, and images. This more than meets the requirements of the social content applications reviewed here, while still granting each application the ability to extend and supplement the vocabulary as necessary.

For a fuller introduction to FOAF see FOAFIntro and the FOAF specification FOAF.

Of the reviewed applications only one, Audioscrobbler, currently supports a FOAF export of its user profile data.

Importing and Harvesting FOAF and RDF

As the number of people creating and hosting their own FOAF profile continues to increase, and as more applications start to support the exporting of FOAF profiles based on user registration data, the natural next step is to consider importing this data into applications.

The first and most immediate benefit will be on the user registration process. Much of the work of required in filling out an registration form can be automated: simply allow a user to point to their current FOAF profile (whether hand-crafted or exported from another service) and populate the form accordingly.

However, as well as reading basic user metadata, applications may also mine the provided FOAF profile for interests, e.g. to allow recommendation of relevant groups or features, and also to import additional details such as relationship data (friends, colleagues, etc). Not only does initial registration become easier, but the service can become indispensable to the user much quicker by tailoring itself to the data contained (or harvestable via) a users self-description.

In fact a service may go so far as to avoid creating and managing user data locally, instead storing only a URL of the users FOAF profile, and simply reading and caching data as it is required. This provides users with much more control over their data, making it easier to update multiple services when information changes.

There are obvious security issues to consider, but it is interesting how the standardisation of something as simple as a user description can begin to open up alternative application architectures.

Self-Description as Service Connectors

Once applications are routinely importing FOAF data provided by end users, a number of additional opportunities then present themselves.

For example, FOAF provides terms that allow a user to state that they have an account on a particular website, in effect tieing their self-description to a specific user identifier on that service. The following RDF fragment illustrates this by stating that the author has an account on del.icious

<foaf:Person>
	<foaf:holdsAccount>
		<foaf:OnlineAccount>
			<foaf:accountName>ldodds</foaf:accountName>
			<foaf:accountServiceHomepage rdf:resource="http://del.icio.us"/>
		</foaf:OnlineAccount>
	</foaf:holdsAccount>
</foaf:Person>

Users of this data could use this identifier to begin initiating calls to the del.icio.us web service, e.g. to discover the authors recent bookmarks.

A direct example will illustrate the benefits: WebJay might process FOAF data to find an Audioscrobbler profile owned by an individual user and from there, via the Audioscrobbler, discover the user’s top-ten artists. Recommendation of suitable playlists or related artists based on data already held in the WebJay system can then used to fine-tune it recommendations to a user without significant development effort, nor lengthy customisation by the end-user.

However this style of integration mechanism is limited in one important regard: the need for application developers to develop specific code to discover data of interest. To continue the example, the WebJay developers need to write code to interface with the Audioscrobbler web service. If similar websites are subsequently launched, then new code must be written for each new integration. This is a cost-intensive solution and one that doesn’t scale well as the number of points of integration rises.

A more flexible means of integration would be to exploit RDF hyperlinking Hyperlinking, i.e. the rdfs:seeAlso property which relates one resource to another that provides a further description. If Audioscrobbler and similar services allowed direct linking to an RDF resource that described a users top-ten artists, then a user could introduce the following into their FOAF document:

<foaf:Person>
	<eg:favouriteArtists>
		<eg:TopTenArtists rdf:resource="http://example.com/music/ldodds/top-end"/>
	</eg:favouriteArtists>
</foaf:Person>

The primary data set contained in the destination resource is indicated by assigning anrdf:type to the resource, in this case eg:TopTenArtists. This avoids the need for an application to traverse all rdfs:seeAlso links discovered in a FOAF document, by narrowing down the crawling of data to just those documents that contain data of interest.

This simple shift to the use of RDF, immediately addresses the above integration problem. No specific application code has to be written to knit together the services: a simple HTTP GET is sufficient to read in the data from any application that exposes its data using a REST service API returning RDF metadata. Integration becomes simple traversal of links. And the development focus can then move to vocabulary creation, and the exploration of the available data.

Of course not all integration issues are immediately dealt with by this approach. The technique offers little scope for fine-tuning of queries between applications, and neither does it allow one service to more easily update another (e.g. via a POST). But it does address the most common case of data exchange, and in particular data exchanges that centre on people and related data.

Summary

The REST architectural style is being embraced by social content application developers as a means to quickly and simply share data with the hacker community. The network effects of data sharing, and the freedom for users to repurpose content, benefits both the service provider and their end users

Reviewing some deployed service interfaces shows the majority fall short of implementing the complete REST architecture. The paper compared these interface using several criteria with the aim of recommending some best practices in service design.

The benefits of using FOAF, RDF and REST in concert were also discussed, highlighting the ease with which web service interfaces can be composed to implement new application features. These benefits are subject to similar network effects: as more applications expose FOAF and RDF data sets, the more opportunities there are for creating innovative new features by combining data from multiple sources.

Bibliography

[43Things] 43Things.
[43ThingsAPI] 43Things Web Service API.
[AtomWiki] An Atom-Powered Wiki, Joe Gregorio, 14th April 2004
[AtomAuth] Atom Authentication, Mark Pilgrim, 17th December 2003
[Audioscrobbler] Audioscrobbler
[Axioms] Universal Resource Identifiers — Axioms of Web Architecture, Identity, State and GET, Tim Berners-Lee, 19th December 1996
[CC] Creative Commons.
[CDDB] CDDB, Wikipedia Article
[del.icio.us] del.icio.us
[delAPI] del.icio.us API
[Ecademy] Ecademy
[EXIF] Exchangeable image file format, Wikipedia Article
[Fielding] Architectural Styles and the Design of Network-based Software Architectures, Roy Fielding, 2000
[Flickr] Flickr.
[FlickrAPI] Flickr API Documentation
[FlickrTools] Flickr Uploading Tools
[FOAF] FOAF Vocabulary Specification, Dan Brickley and Libby Miller, 4th April 2005
[FOAFIntro] An Introduction to FOAF, Leigh Dodds, 4th February 2004
[Friendster] Friendster
[Hyperlinking] RDF Hyper-linking, Dan Brickley, 2003.
[Last.FM] Last.FM
[Last.FMTutorial] Last.FM Tutorial.
[MusicBrainz] MusicBrainz
[MusicBrainzClient] MusicBrainz Client Library HOWTO
[MusicBrainzTagger] About the MusicBrainz Tagger
[Orkut] Orkut
[RDFPrimer] RDF Primer, W3C Recommendation 10 February 2004
[RestfulWeb] How to Create a REST Protocol, Joe Gregorio, 1st December 2004
[ScreenScraping] Screen Scraping, Wikipedia Article
[SquaredCircle] Squared Circle Flickr Group
[StatusCodes] HTTP/1.1: Status Codes, Fielding et al, June 1999
[Udell] Networks of Shared Experience, John Udell, 14th April 2004
[Upcoming] Upcoming.org.
[UpcomingAPI] Upcoming.org API Documentation - Version 1.0
[WebJay] WebJay.
[WebJayAPI] Webjay API, Lucas Gonze, v0.0 (Draft June 2, 2004)
[WS-Amazon] Making Web Services Work at Amazon, Edd Dumbill, 9th December 2003
[WS-Flickr] Stewart Butterfield on Flickr, Richard Koman, 2nd April 2005
[XSPF] XML Shareable Playlist Format

One thought on “Connecting Social Content Services using FOAF, RDF and REST

  1. [...] uploaded the slides (Powerpoint) from my XTech 2005 talk: Connecting Social Content Services using FOAF, RDF and REST. In the presentation I basically gave an overview of the paper, touching on some areas where I [...]

Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 29 other followers

%d bloggers like this: