Thoughts on Linked Data Business Models

Scott Brinker recently published a great blog post covering 7 business models for Linked Data. The post is well worth a read and reviews the potential for both direct and indirect revenue generation from a range of different business models. I’ve been thinking about these same issues myself recently so I’m pleased to see that others are doing similar analysis. Scott’s conclusion that, currently, Linked Data is more likely to drive indirection revenue is sound, and reflects where we are with the deployment of the technology.

The time is ripe though for organizations to begin exploring direct revenue generation models and it’s there that I wanted to add some thoughts and commentary to Scott’s posting.


The traffic model, with its indirect revenue generation by driving traffic to existing content and services, is well understood. The same model has been used to encourage organizations to open up Web APIs, so its natural to consider this for Linked Data also.

Because it is tried and tested it’s currently one of the strongest arguments for driving adoption of Linked Data, so I’d put this right at the top of the list. The feedback loop that is in place now with search engines makes that traffic generation a reality.


Scott mentions adverts as a possible revenue stream and raises the possibility of “data-layer ads”, by which I understand him to mean advertising included in the Linked Data itself. While I agree that an advertising model is a potential revenue stream, I don’t see that “data-layer ads” are really viable or actually useful in practice.

Adverts incorporated into raw data will be too easily stripped out or ignored by applications; by definition the adverts will be easily identifiable. RSS advertising doesn’t seem to have really taken off (I certainly never see them anyway) and I think this is for similar reasons: if the adverts are easily identifiable, then they can be stripped. And if they’re included in content or data values, then this causes problems for further machine-processing of the data and annoyances for end users.

Of course a business could enforce that users of its Linked Data should display ads through its terms and conditions, e.g. requiring data-layer ads to be displayed in some form to users of an application. In practice this can get problematic, especially if there’s not an obvious way to surface the ads to end users. But I think its also problematic as unlike a Web API where I sign up to gain access, for an arbitrary Linked Data site, there is no prior agreement required. My crawler or browser might fetch data without any knowledge of what those terms and conditions might be.

Adverts embedded into data is are not a useful way to distribute them to end-users. In an environment where adverts are increasingly profiled by a range of geographic, demographic or behavioural factors, incorporating blanket ads into data feeds loses all of that targetting capability. It also potentially loses the feedback, e.g. on click-throughs or impressions, that are useful for gauging the success of a campaign.

In my view advertising as a model to support Linked Data publishing is more likely to echo that used by the Guardian as part of its Open Platform terms and conditions (See Section 8, Advertising and Commercial Usage). The terms require users of the content to display ads from Guardian’s advertising network on its website. This avoids the need to include adverts in the data layer and supports a conventional model for delivering ads, making it play well with current advertising platforms and targeting options.


As Brinker notes, subscription models for data, content and services have been around for some time. The interesting thing is to see how these models have been evolving of late due to pressures in various industries, and how these intersect with the open data movement. For Linked Data to be most useful some of its needs to be free: you need to make at least a bare minimum of data freely available, e.g. to identify objects of interest, to enable annotation and linking, etc. In my opinion a freemium model is the core of any subscription model for Linked Data.

Having previously worked in the academic publishing industry which is very heavily driven by subscription revenues, I’ve noticed a number of models that have come to the fore there, most recently driven by the Open Access movement. I think many of these are transferable to other contexts. So while the particulars will vary in different industries, the means of slicing up data into subscription packages are likely to be repeatable.

All of the following assume that some basic element of the Linked Data is free, but that one is paying for:

  • Full Access — Pay for access to detailed, denser data. The value-added data might include richer links to other datasets, more content, etc
  • Timely Access — Pay for access to the most recent, or more current version of the data. This leaves the bulk of the data open but delivers a commercial advantage to subscribers. As data gets older, it automatically becomes free
  • Archival Access — Putting archives of content, or large archival datasets on-line can be expensive in terms of data conversion, digitization, and service provision. So deep archives of data might only be available to subscribers. Commercial advantage derives from having more data to analyse and explore.
  • Block Access — paying for access to a dataset based on time, e.g. “for the next 24 hours”; or based on the number, frequency of accesses; or the number of concurrent accesses.
  • Convenient Access — paying for access to the data through a specific mechanism. This might seem at odds with Linked Data, but its reasonable to assume that some organizations might want data feeds or dumps rather than on-line only access. This might come at a premium.

These variants can combined and might also be separated out into personal (non-commercial) and commercial subscription packages.

It’s interesting to see how some of these (Timely Access, Convenient Access) are already in use in projects like Musicbrainz that blend Open Data with commercial models.


One model that Scott Brinker doesn’t mention in his posting is Sponsorship. An organization might be funded to publish Linked Data, e.g. for the public good. The organization itself might be a charity and funded by donations.

It’s arguable that this might be more about cost recovery for service provision rather than a true business model, but I think its worth considering. Some of the open government data publishing efforts and possibly even the Linked Data from the BBC, could be seen as falling into this category.

It’s probably most viable for public sector, cultural heritage and similar organizations.

Closing Thoughts

What needs to happen to explore these different models? Is it just a matter of individual organizations experimenting to see what works and what doesn’t?

I think that is largely the case, and we’ll definitely be seeing that process begin to happen in earnest in 2010; a process that we’ll be supporting and enabling with the Talis Platform.

From a technical perspective I’m interested to see how well protocols like OAuth and FOAF+SSL can be deployed to mediate access to licensed Linked Data.

Rights Statements on the Web of Data

This is a write-up of my contribution to the Legal and Social Frameworks for Sharing Data on the web workshop at ISWC 2009 in October 2009. It was later published in Nodalitied magazine Issue 9.

Why do we publish open data? It’s to allow other people to reuse it; to take it and do creative and innovative things with it. We open data because we want it to be used outside the confines of our own projects, applications and organisations. To achieve that aim we need to do lots of ground work particularly around adopting open formats, and also in ensuring that data is both discoverable and wrapped in services that make it useful. But we also need to clearly communicate our basic intention: that the data is available for reuse. And we need to be clear on what forms of reuse we expect or want to support.

Within the open data movement in general, and the Linked Open Data movement specifically, we’re building a commons. An open environment that contains data from a wide variety of different sources that can be meshed together and re-used in a number of
powerful ways. We’re building the foundations for the next wave of innovative web applications. Ensuring the stability of those foundations involves addressing any rights or legal issues that are going to impact the community of users in the future, and impede innovation and progress.

In the open source and open standards worlds a lot of attention is rightly paid to legal issues, e.g. around patent rights, that may effect the spread of open standards and software. The equivalent area for open data is understanding the rights that are associated with a dataset. Clear, explicit rights statements are a means to achieve that. Very often though, this is a difficult thing to achieve.
In many cases the people involved in the process of opening up data do not have a legal background or framework that can be used to understand the issues of that may impact their efforts. At ISWC 2009, Jordan Hatcher, Kaitlin Thaney, Tom Heath and myself ran a workshop on Legal and Social issues facing data sharing on the web. The goal of that workshop was to help increase understanding in the community about the importance of open data licensing. In this issue of Nodalities each of us has contributed an article on one aspect of that discussion.

It is important to understand that there will never be a single off-the-shelf solution to open licensing. There will be a range of different approaches that are tailored to the needs of a particular community or type of publisher. In some communities, putting all data into the public domain is likely to become the norm; if it’s not already. In others very little data may be opened up, and even then its use may be restricted in different ways. It’s therefore important to understand what legal tools are suitable in which context. There are a growing number of these tools available, and in his article Jordan Hatcher introduces several of them.

Social norms, e.g. around attribution and citation, are another area in which communities will differ. In her role at the Science Commons project, Kaitlin Thaney has been working closely with the scientific community on issues relating to open data publishing. The norms for strong attribution and sharing used within that scholarly community set a high standard to which those of us working on Linked Open Data should aspire.

We also need to be publishing rights statements in both human and machine-readable formats. To achieve that we need to understand the range of different ways in which data is published and accessed on the Linked Data web, and ensure that data delivered through each mechanism can be properly linked to, or annotated with a rights statement. In his contribution to this themed issue, Tom Heath reviews some of the existing tools available for annotating datasets with machinereadable rights statements.

Linked Data also offers its own challenges. Data exposed through a Web 2.0 API is typically done after an explicit agreement has taken place between the producer and the consumer: to access the API you will need to have signed up to gain an API key and agreed to terms and conditions. There is nothing like this in use in Linked Data today; anyone can access any data at any URI. And a Linked Data browser may be simultaneously showing and aggregating data from a number of different sources. We need to think carefully about how usage terms apply in these different contexts, and how those terms can be made explicit to the end user of an application.

As part of the background research for our ISWC tutorial I took a look at the state of licensing across the Linked Data cloud. For every dataset in the March 2009 version of the Linked Open Data cloud diagram I tried to discover two things. Firstly, if any rights
statements about reuse had been made available and, secondly, what kind of license was being used (if any). You can see a summary of the results in figure 1. The diagram is a colour-coded version of the original, with each colour representing a different kind of license.

There are a number of conclusions that can be drawn from this analysis:

  • The majority of datasets do not have clear licensing terms associated with them. It is therefore unclear as to how reusable the data really is. How do we improve this?
  • Creative Commons licenses are well represented. These licenses are certainly easy to understand, and are increasingly common. But are they the best tool for the job?
  • Attribution is a strong theme across all licensing schemes. Clearly attribution is an important part of data publishing. How do we best carry out attribution across an increasingly interwoven and interdependent series of datasets?

Consider these questions as you read the rest of the articles in this issue of Nodalities, and as you return to your own open data practices think how you can start putting this advice to good use in building a strong foundation for the future of the open web.