Comparing the 5-star scheme with Open Data Certificates

Originally published on the Open Data Institute blog.

I’ve been asked several times recently about the differences between the 5-star scheme for open data and the Open Data Certificates. How do the two ratings relate to one another, if at all? In this blog post I aim to answer that question.

The 5-star scheme

The 5-star deployment scheme was originally proposed by (our President) Tim Berners-Lee in his linked data design principles. The scheme is neatly summarised in this reference, which also identifies the costs and benefits associated with each stage.

Essentially, the scheme measures how well data is integrated into the web. “1-star” data is published in proprietary formats that users must download and process. “5-star” data can be accessed online, uses URIs to identify the resources in the data, and contains links to other sources.

The scheme is primarily focused on how data is published: the formats and technologies being used. Assessing whether a dataset is published at 2, 3 or 4 stars requires some insight into how the data has been published, which can be difficult for a non-technical person to assess.

The scheme is therefore arguably best used as a technical roadmap and a short-hand assessment of the technical aspects of data publishing.

Open Data Certificates

The Open Data Certificates process takes an alternative but complementary view. A certificate measures how effectively someone is sharing a dataset for ease of reuse. The scope covers more than just technical issues including rights and licensing, documentation, and guarantees about availability. A certificate therefore offers a more rounded assessment of the quality of publication of a dataset.

For data publishers the process of assessing a dataset provides insight into how they might improve their publishing process. The assessment process is therefore valuable in itself, but the certificate that is produced is also of value to reusers.

An Open Data Certificate acts as a reference sheet containing information of interest to reusers of a dataset. This saves time and effort digging through a publishers website to find out whether a dataset can meet their needs. The ability to search and browse for certified datasets may eventually make it easier to find useful data.

Despite these differences, the certificates and the 5-star scheme are in broad alignment. Both aim to improve the quality and accessibility of published data. And both require that data is published under open licences using standard formats. We would expect a dataset published to Expert level on the certificates to be well-integrated into the web, for example.

However it doesn’t necessarily follow that all “5-star” data would automatically gain an Expert rating: a dataset may be well integrated into the web but still be poorly maintained or documented.

In our view the Open Data Certificates provide clearer guidance for data publishers to consider when planning and improving their publishing efforts. They help publishers look at the bigger picture of data-user needs, many of which are not about the data format or whether the data contains URIs. This bigger picture can help inform data publishing roadmaps, procurement of data publishing services and policy development.

The certificates also provide a clear quality mark for reusers looking for assurances around how well data is published.

The 5-star scheme has been very effective at moving publishers away from Excel and closed licences and towards CSV and open licences. But for sustained and sustainable open data, reusers need the publishers of open data to consider more than licences and data formats. The Open Data Certificates helps publishers do that.