Let’s talk about open data licensing. Again.
Last year I wrote a post, the State of Open Licensing in which I gave a summary of the landscape as I saw it. A few recent developments mean that I think it’s worth posting an update.
But Leigh, I hear you cry, do people really care about licensing? Are you just fretting over needless details? We’re living in a post-open source world after all!
To which I would respond, if licensing doesn’t have real impacts, then why did the open source community recently go into meltdown about Facebook’s open source licences? And why have they recanted? There’s a difference between throwaway, unmaintained code and data, and resources that could and should be infrastructure.
The key points I make in my original post still stand: I think there is still a need to encourage convergence around licensing in order to reduce friction. But I’m concerned that we’re not moving in the right direction. Open Knowledge are doing some research around licensing and have also highlighted their concerns around current trends.
So what follows is a few observations from me looking at trends in a few different areas of open data practice.
Licensing of open government data
I don’t think much has changed with regards to open licenses for government data. The UK Open Government Licence (UK-OGL) still seems to be the starting point for creating bespoke national licences.
Looking through the open definition forum archives, the last government licence that was formally approved as open definition compliant was the Taiwan licence. Like the UK-OGL Version 3, the licence clearly indicates that it is compatible with the Creative Commons Attribution (CC-BY) 4.0 licence. The open data licence for Mexico makes a similar statement.
In short, you can take any data from the UK, Taiwan and Mexico and re-distribute it under a CC-BY 4.0 licence. Minimal friction.
I’d hoped that we could discourage governments from creating new licences. After all, if they’re compatible with CC-BY, then why go to the trouble?
But, chatting briefly about this with Ania Calderon this week, I’ve come to realise that the process of developing these licences is valuable, even if the end products end up being very similar. It encourages useful reflection on the relevant national laws and regulations, whilst also ensuring there is sufficient support and momentum behind adoption of the open data charter. They are as much as a statement of shared intent as a legal document.
The important thing is that national licences should always state compatibility with an existing licence. Ideally CC-BY 4.0. This removes all doubt when combining data collected from different national sources. This will be increasingly important as we strengthen our global data infrastructure.
Licensing of data from commercial publishers
Looking at how data is being published by commercial organisations, things are very mixed.
Within the OpenActive project we now have more than 20 commercial organisations publishing open data under a CC-BY 4.0 licence. Thomson Reuters are using CC-BY 4.0 as the core licence for its PermID product. And Syngenta are publishing their open data under a CC-BY-SA 4.0 licence. This is excellent. 10/10 would reuse again.
But in contrast, the UK Open Banking initiative has adopted a custom licence which has a number of limitations, which I’ve written about extensively. Despite feedback they’ve chosen to ignore concerns raised by the community.
Elsewhere the default is for publishers and platforms to use custom terms and conditions that create complexity for reusers. Or for lists of “open data” to have no clear licensing.
Licensing in the open data commons
It’s a similar situation in the broader open data commons.
In the research community CC0 licences have been recommended for some time and is the default on a number of research data archives. Promisingly the FigShare State of Open Data 2017 report (PDF) shows a growing awareness of open data amongst researchers, and a reduction in uncertainty around licensing. But there’s still lots of work to do. Julie McMurry of the (Re)usable Data Project notes that less than half of the databases they’ve indexed have a clear, findable licence.
While the CC-BY and CC-BY-SA 4.0 licences are seen to be the best practice default, a number of databases still rely on the Open Database Licence (ODbL). OpenStreetMap being the obvious example.
The OSM Licence Working Group has recently concluded that, pending a more detailed analysis, the Creative Commons licences are incompatible with the ODbL. They now recommend asking for specific permission and the completion of a waiver form before importing CC licenced open data into OSM. This is, of course, exactly the situation that open licensing is intended to avoid.
Obtaining 1:1 agreements is the opposite of friction-less data sharing.
And it’s not clear whose job it is to sort it out. I’m concerned that there’s no clear custodian for the ODbL or investment in its maintenance. Resolving issues of compatibility with the CC licences is clearly becoming more urgent. I think it needs an organisation or a consortia of interested parties to take this forward. It will need some legal advice and investment to resolve any issues. Taking no action doesn’t seem like a viable option to me.
Based on what I’ve seen summarised around previous discussions there seem to be some basic disagreements around the approaches taken to data licensing that have held up previous discussions. Creative Commons could take a lead on this, but so far they’ve not certified any third-party licences as compatible with their suite. All statements have been made the other way.
Despite the use by big projects like OSM, its really unclear to me what role the ODbL has longer term. Getting to a clear definition of compatibility would provide a potential way for existing users of the licence to transition at a future date.
Just to add to the fun, the Linux Foundation have thrown two new licences into the mix. There has been some discussion about this in the community and some feedback in these two articles in the Register. The second has some legal analysis: “I wouldn’t want to sign it“.
Adding more licences isn’t helpful. What would have been helpful would have been exploring compatibility issues amongst existing licences and investing in resolving them. But as their FAQ highlights, the Foundation explicitly chose to just create new licences rather than evaluate the current landscape.
I hope that the Linux Foundation can work with Creative Commons to develop a statement of compatibility, otherwise we’re in an even worse situation.
Some steps to encourage convergence
So how do we move forward?
My suggestions are:
- No new licences! If you’re a government, you get a pass to create a national licence so long as you include a statement of compatibility with a Creative Commons licence
- If your organisation has issues with the Creative Commons licences, then document and share them with the community. Then engage with the Creative Commons to explore creating revisions. Spend what you would have given your lawyers on helping the Creative Commons improve their licences. It’s a good test of how much you really do want to work in the open
- If you’re developing a platform, require people to choose a licence or set a default. Choosing a licence can include “All Rights Reserved”. Let’s get some clarity
- We need to invest further in developing guidance around data licensing.
- There’s still so much that’s unclear around database rights and derived data.
- We’ve written loads of guidance on licensing at the ODI, but maybe it needs to be more accessible?
- The OSM community has produced some draft community guidance on using their data, but it’s not clear (to me) if these are social norms for OSM or whether the guidance could apply to all usage of ODbL licensed data?
- Let’s sort out compatibility between the CC and ODbL licence suites
- Let’s encourage Linux Foundation to do the same, and also ask them to submit their license to the licence approval process. This should be an obvious step for them as they’ve repeatedly highlighted the lessons to be learned from open source licensing, which go through a similar process.
I think these are all useful steps forward. What would you add to the list? What organisations can help drive this forward?
Note that I’m glossing over a set of more nuanced issues which are worthy of further, future discussion. For example whether licensing is always the right protection, or when “situated openness” may be the best approach towards building trust with communities. Or whether the two completely different licensing schemes for Wikidata and OSM will be a source of friction longer term or are simply necessary to ensure their sustainability.
For now though, I think I’ll stick with the following as my licensing recommendations: