The state of open licensing

I spend a lot of time reading through licences and terms & conditions. Much more so than I thought I would when I first started getting involved with open data. After all, I largely just like making things with data.

But there’s still so much data that is public but not open. Or datasets that are nearly open but which have been wrapped in awkward additional terms. And still plenty of confusion about what open data actually is, as Andy Dickinson highlighted yesterday.

And yet the open data licensing choices really aren’t that hard, you can get the essential choices in a tweet.

Resolving this is just going to take more time, education and patient explanation of the benefits and disadvantages of different licensing models.

But I’ve been wondering about what direction we’re moving in with regards to licensing.

Reducing friction

Since the release of 4.0 series of creative commons licences we’ve had a standard, globally applicable set of terms that allow us to openly licence all forms of creative works and datasets. I really don’t see any reason to continue to use the Open Database Licence and I would love the maintainers to either clarify the continued role it plays or acknowledge that it’s deprecated and discourage its use.

The UK Open Government Licence (OGL) has spawned a variety of national licences. But, now that it is interchangeable with CC-BY 4.0, its continued existence also seems largely unnecessary. Governments currently without a standard national licence are better off adopting CC-By 4.0 than creating another fork of the OGL.

There may be good reasons for retaining the OGL and I’d be interested in hearing them if anyone has opinions. But it feels like we might continue to simplify the licensing landscape by planning for it to become obsolete.

I continue to wrestle with the fact that I’m becoming an open data pedant. (And maybe I am!) But I feel these are issues that are important to pay attention to, if only to follow evolving best practice.

That said, I’m convinced that any friction around licensing can potentially hamper the reuse of open data. So I think its something to recognise and remove wherever possible. The more the commons is used, the more value will be unlocked. And this will help it grow, not just by increasing contribution, but also through increasing investment, so we can have a proper open data infrastructure for society as a whole.

And friction not only hampers reuse it also slows publication of new data. I know from experience that confusion around appropriate licences are a common area of uncertainty for publishers. Especially commercial publishers who are concerned about the risks of adopting open licences rather than using custom terms which are within their comfort zone.

As specific licencing frameworks and model terms and conditions become embedded, they will be harder to remove later. It’s important to not overlook the impacts of bespoke terms.

Evolving practice

For example its interesting to see how, for example, the OpenOpp terms borrow heavily from those of OpenCorporates. As a successful open data business its not surprising that OpenCorporates is being used as an exemplar.

But, in my opinion, the OpenCorporates terms have some niggling issues. Firstly there is the specific requirement around how attribution must be presented (font sizes, and not just a text and a link), coupled with the requirement that anyone re-publishing the data must ensure that downstream users also conform with those requirements. That’s really not dissimilar to the custom attribution requirements that were present in the Ordnance Survey’s original fork of the OGL.

The open data community has campaigned at length to convince governments that they should, at most, require simple attribution statements from re-users of its data. I don’t think its a positive move for that same data to begin accumulating new terms and licences within its first few steps into the ecosystem.

That said, the more concerning way in which practice may evolve is by stepping away from open licensing entirely. That goes hand-in-hand with the increasing interest and reference to “data markets” which I’ve encountered from many city-based initiatives. I’ve already written at length about my thoughts on the Copenhagen marketplace and I’m hoping London isn’t going in the same direction.

Elsewhere though, I see promising progress. The scientific research community has long been converging on CC0 (public domain) for its data and CC-BY for its content. CC0 avoids problems with attribution stacking and that community has long had social norms that encourage recognition of sources, without requiring it through a licensing regime.

But that practice isn’t yet so commonplace elsewhere. Even though it part and parcel of being a good re-user. The visible impact of open data and content is a tide that raises all boats. If you call yourself an open data start-up you should have be able to proudly point to where your data sources are listed on your website.

I also read that the US may be adopting legislation that will ensure that its open government data remains in the public domain. This is fantastic. That change will also clarify that the data is in the public domain internationally. It’s currently unclear whether “public domain” actually means “public domain within the US”. It may be crystal clear to IP and Copyright lawyers but not necessarily to non-experts like myself, which is my general point.

I wonder whether the general trajectory will be as the EFF recommend, for more open data to be placed into the public domain? That would require a big step forward for many governments as well as established projects like OpenStreetMap. Large scale licensing changes of that form are tricky to co-ordinate. Realistically I don’t see it happening unless there are either major changes to the social norms around data reuse, or until we start bumping into compatibility issues between data from different communities.

That’s not entirely unlikely however. For example the prevalence of CC-BY and CC-BY-SA style licencing from the commercial and public sector is at odds with research norms that require raw and derived data to be placed into the public domain under a CC0 waiver. You can’t draw from one well and then add to the other. However, there are bigger issues to address first, as the recent OKCupid data release highlighted.