This is a write-up of my contribution to the Legal and Social Frameworks for Sharing Data on the web workshop at ISWC 2009 in October 2009. It was later published in Nodalitied magazine Issue 9.
Why do we publish open data? It’s to allow other people to reuse it; to take it and do creative and innovative things with it. We open data because we want it to be used outside the confines of our own projects, applications and organisations. To achieve that aim we need to do lots of ground work particularly around adopting open formats, and also in ensuring that data is both discoverable and wrapped in services that make it useful. But we also need to clearly communicate our basic intention: that the data is available for reuse. And we need to be clear on what forms of reuse we expect or want to support.
Within the open data movement in general, and the Linked Open Data movement specifically, we’re building a commons. An open environment that contains data from a wide variety of different sources that can be meshed together and re-used in a number of
powerful ways. We’re building the foundations for the next wave of innovative web applications. Ensuring the stability of those foundations involves addressing any rights or legal issues that are going to impact the community of users in the future, and impede innovation and progress.
In the open source and open standards worlds a lot of attention is rightly paid to legal issues, e.g. around patent rights, that may effect the spread of open standards and software. The equivalent area for open data is understanding the rights that are associated with a dataset. Clear, explicit rights statements are a means to achieve that. Very often though, this is a difficult thing to achieve.
In many cases the people involved in the process of opening up data do not have a legal background or framework that can be used to understand the issues of that may impact their efforts. At ISWC 2009, Jordan Hatcher, Kaitlin Thaney, Tom Heath and myself ran a workshop on Legal and Social issues facing data sharing on the web. The goal of that workshop was to help increase understanding in the community about the importance of open data licensing. In this issue of Nodalities each of us has contributed an article on one aspect of that discussion.
It is important to understand that there will never be a single off-the-shelf solution to open licensing. There will be a range of different approaches that are tailored to the needs of a particular community or type of publisher. In some communities, putting all data into the public domain is likely to become the norm; if it’s not already. In others very little data may be opened up, and even then its use may be restricted in different ways. It’s therefore important to understand what legal tools are suitable in which context. There are a growing number of these tools available, and in his article Jordan Hatcher introduces several of them.
Social norms, e.g. around attribution and citation, are another area in which communities will differ. In her role at the Science Commons project, Kaitlin Thaney has been working closely with the scientific community on issues relating to open data publishing. The norms for strong attribution and sharing used within that scholarly community set a high standard to which those of us working on Linked Open Data should aspire.
We also need to be publishing rights statements in both human and machine-readable formats. To achieve that we need to understand the range of different ways in which data is published and accessed on the Linked Data web, and ensure that data delivered through each mechanism can be properly linked to, or annotated with a rights statement. In his contribution to this themed issue, Tom Heath reviews some of the existing tools available for annotating datasets with machinereadable rights statements.
Linked Data also offers its own challenges. Data exposed through a Web 2.0 API is typically done after an explicit agreement has taken place between the producer and the consumer: to access the API you will need to have signed up to gain an API key and agreed to terms and conditions. There is nothing like this in use in Linked Data today; anyone can access any data at any URI. And a Linked Data browser may be simultaneously showing and aggregating data from a number of different sources. We need to think carefully about how usage terms apply in these different contexts, and how those terms can be made explicit to the end user of an application.
As part of the background research for our ISWC tutorial I took a look at the state of licensing across the Linked Data cloud. For every dataset in the March 2009 version of the Linked Open Data cloud diagram I tried to discover two things. Firstly, if any rights
statements about reuse had been made available and, secondly, what kind of license was being used (if any). You can see a summary of the results in figure 1. The diagram is a colour-coded version of the original, with each colour representing a different kind of license.
There are a number of conclusions that can be drawn from this analysis:
- The majority of datasets do not have clear licensing terms associated with them. It is therefore unclear as to how reusable the data really is. How do we improve this?
- Creative Commons licenses are well represented. These licenses are certainly easy to understand, and are increasingly common. But are they the best tool for the job?
- Attribution is a strong theme across all licensing schemes. Clearly attribution is an important part of data publishing. How do we best carry out attribution across an increasingly interwoven and interdependent series of datasets?
Consider these questions as you read the rest of the articles in this issue of Nodalities, and as you return to your own open data practices think how you can start putting this advice to good use in building a strong foundation for the future of the open web.