How to open your data in six easy steps

Originally published on the Open Data Institute blog. Original URL: https://theodi.org/blog/how-to-open-your-data-in-six-easy-steps

1. Scope out the job at hand

Before taking the plunge and jumping straight into publishing, there are a few things to think through first. Take time to consider what data you’re going to release, what it contains and what the business case is for releasing it in the first place.

Consider what licence you’re going to put on the data for others to use. There’s a selection to choose from, depending on how you want others to use it, see our guidance here.

Here are some other key things to consider at this stage:

  • Where will it be published?
  • Will I need documentation around it?
  • What level of support is needed?
  • How frequently will I release the data?

2. Get prepared

Your data is only really useful to others if it’s well structured and has clear metadata (or a data description) to give it context and explain what it’s about and where it comes from.

Start your prep with a technical review using sample data, and identify suitable formats for release and the level of detail and metadata required. Also consider whether it’ll be most useful to the user as an API or a download. Data can be more useful when linked to other datasets, so keep an eye out for opportunities.

Consider your capabilities in-house and whether you need any training in order to release the data, whether technical or around certification. Some ODI courses can help with this.

Finally, think about what metadata you’re going to add to your data to describe what it is or how to use it.

3. Test your data

Before you release your data, you might want think about doing a preview with some of your potential users to get some detailed feedback. This isn’t necessarily required for smaller datasets, but for larger releases this user-testing can be really useful.

Don’t forget to get an Open Data Certificate to verify that your data is being published properly.

4. Release your data

Now for the exciting bit: releasing your data, the metadata and the documentation to go with it.

The key thing here is to release your data where your users will be. Otherwise, what’s the point? Where you should release it depends on who you are, but in general you should publish it on your own website, ensuring it’s also listed on relevant portals. For example, public sector organisations should add their data to data.gov.uk. Some sectors have their own portals – in science it’s the norm to publish in an institutional repository or a scientific data repository.

Basically, do your research into how your community shares data, and make sure it’s located in a place you have control over or where you’re confident the data can be consistently available.

When applying the Open Data Certificate, we’ll ask for evidence that the dataset is being listed in one or more portals to ensure it’s accessible.

5. Get engagement and promotion

It’s easy to relax after spending so much time and effort in preparing and releasing your dataset, but don’t just ‘fire and forget’. Make sure you have follow-up activities to let people know the data exists and be responsive to questions they might have. You can engage people in multiple ways (depending on your target audience), for example through blogs or social media. Encourage users to tell you how they’re using the data, so you can promote success stories around it too.

6. Reflect and improve

Now your dataset it out there in the big wide world, take some time to reflect on it. Listen to feedback, and decide what changes you could make or what you’d do differently next time.

If you want to measure your improvement, consider taking a maturity assessment using our Open Data Pathway tool.

101100

Today I am 101100.

That’s XLIV in Roman.

44 is also the square root of 1936. 1936 was a leap year starting on a Wednesday.

The Year 44 was also a leap year starting on a Wednesday.

It was also known as the Year of the Consulship of Crispus and Taurus. Which is another coincidence because I like crisps and I’m also a Taurus.

And while we’re on Wikipedia, we can use the API to find out that page id 101100 is Sydney Harbour National Park which opened when I was 3.

Wolfram Alpha reminds me that 44 is the ASCII code for a comma.

Whichever way you look at it #101100 is a disappointing colour.

But if we use the random art generator then we can make a more colourful image from the number. But actually the image with that identifier is more interesting. Glitchy!

The binary number is also a car multimedia entertainment system. But £200 feels a bit steep, even if it is my birthday.

A 12 year old boy once bid £101,100 for a flooded Seat Toledo on EBay. Because reasons.

101100, or tubulin tyrosine ligase-like family, member 3 to its friends, also seems to do important things for mice.

I didn’t really enjoy Jamendo album 101100, the Jamez Anthony story.

Care of Cell Block 101100 was a bit better in my opinion. But only a bit.

Discogs release 101100 is The Sun’s Running Out by Perfume Tree. Of which the most notable thing is that track six includes a sample from a Dr Who episode.

I’m not really sure what the tag 101100 on flickr means.

IMDB entry 101100 is “Flesh ‘n’ Blood

The Board Game Geek identifier 101100 is for an XBox 360 version of 1 vs 100. That’s not even a board game!

Whereas Drive Thru RPG catalogue product 101100 as Battlemage. Which sounds much more interesting.

If I search for “101100 coordinates” on google, then it tells me that it’s somewhere in China. I should probably know why.

There are 26 results for 101100 on data.gov.uk. But none on data.gov. Which explains why the UK is #1 in the world for open data.

But HD 101100 is also a star.

And a minor planet discovered on 14th September 1998

CAS 101-10-0 is 2-(3-Chlorophenoxy)propionic acid. I think its a herbicide. Anyway, this is what it looks like.

It’s also a marine worm.

And an insect.

In the database of useful biological numbers, we discover that entry 101100 is the maximal emission wavelength for Venus fluorophore. Which is, of course, 528 nm.

I think the main thing I’ve learnt in my 44 years is that the web is an amazing place.

On accessibility of data

My third open data “parable”. You can read the first and second ones here. With apologies to Borges.

. . . In that Empire, the Art of Information attained such Perfection that the data of a single City occupied the entirety of a Spreadsheet, and the datasets of the Empire, the entirety of a Portal. In time, those Unconscionable Datasets no longer satisfied, and the Governance Guilds struck a Register of the Empire whose coverage was that of the Empire, and which coincided identifier for identifier with it. The following Governments, who were not so fond of the Openness of Data as their Forebears had been, saw that that vast register was Valuable, and not without some Pitilessness was it, that they delivered it up to the Voraciousness of Privatisation and Monopolies. In the Repositories of the Net, still today, there are Stale Copies of that Data, crowd-sourced by Startups and Citizens; in all the Commons there is no other Relic of the Disciplines of Transparency.

Sharon More, The data roads less travelled. London, 2058.

 

Caution: data, use responsibly

Originally published on the Open Data Institute blog. Original URL: https://theodi.org/blog/caution-data-use-responsibly

In December 2015, Ben Goldacre and Anna Powell-Smith launched the beta of Open Prescribing. The site, which was swiftly celebrated in the open data community and beyond, provides insight into the prescribing practices of GPs around the UK. Its visualisations and reports give an entirely new perspective on some of the bulk open datasets available from the NHS.

Open Prescribing is a fantastic demonstration of how openly publishing data can unlock new, creative uses.

There is a particular feature of the site which piqued my interest: a page entitled, ‘Caution: how to use the data responsibly‘. Goldacre and Powell-Smith have included some clear guidance that helps users to properly interpret their findings, including:

  • guidance on how to interpret high and low values for the measurements, encouraging thought into what patterns they may or may not demonstrate – because of differences in population around a practice, for example
  • notes on how the individual measures were decided upon
  • insight into the importance of specific drugs and measures for a non-specialist audience
  • links to useful background information from the original data publishers

The ‘About‘ page for the site also attributes all of the datasets that were used as input to the analysis.

Clear attribution, provenance reporting and guidance on limits to the analysis might be expected from authors with a background in evidence-based medicine. It’s not yet normal practice within the open data community. But it should be.

As a society, we are making an increasing number of decisions based on data, about our health, economy and businesses. So it’s becoming more and more important that we know the limits of what that data can reliably tell us. Data enables informed decisions. Knowing the limits of data also makes us more informed.

In my opinion all data analysis should have an equivalent of the Open Prescribing “/caution” URL.

To achieve this data users need to know more about how data is collected and processed before it is published. This is why the higher levels of Open Data Certificaterequire publishers to:

  • document any known quality issues or limitations with the data
  • publish details of their quality control processes, including how to report errors
  • describe the provenance of the data, e.g. how it was collected and analysed

That information provides the necessary foundation for re-users to properly interpret and apply data. This information can then be cited, as it is on Open Prescribing, to help downstream users understand the impacts on any analysis.

Documenting the datasets used in an analysis is another norm that’s common in the medical and scientific communities. Linking to source datasets is the basis for citation analysis in academic research. These links power many types of discovery tools, and help improve reproducibility and transparency in research.

Use of machine-readable attributions could do the same for more general uses of data online. In the early days of the web, developers would “view source” to view the markup behind a webpage to learn how it was put together. The ability to “view sources” to discover the data underlying an application or data analysis would be a useful feature for the data web.

So, if you’re doing some data analysis, follow the best practices embodied by Open Prescribing and help users and other developers to understand how you’ve achieved your results.

Take your first steps with Open Data Pathway

Originally published on the Open Data Institute blog. Original URL: https://theodi.org/blog/take-your-first-steps-with-open-data-pathway

We’re launching a new tool today called Open Data Pathway. It’s a self-assessment tool that will help you assess how well your organisation publishes and consumes open data, and identify actions for improvement.

The tool is based on the Open Data Maturity Model we have been developing in partnership with the Department for Environment, Food & Rural Affairs.

The maturity model is based around five themes and maturity levels. Each theme represents a broad area of operations within an organisation, and is broken down into areas of activity which can then be used to assess progress.

We’ve previously published the maturity model as a public draft. We would like to thank everyone from across central and local government, agencies and other organisations who have given feedback on the draft documents. Your contributions and ideas were extremely valuable. We’re pleased to announce that the final, first edition of the model is now available.

Open Data Pathway supports open data practitioners in carrying out a maturity assessment. Completing an assessment will create a report that scores your organisation against each activity. The report also includes practical recommendations that suggest how scores can be improved for each activity. Combined with the ability to set targets for improvement, Open Data Pathway provides a complete self-assessment tool to enable practitioners to successfully apply the maturity model to their organisation.

Open Data Pathway offers a useful complement to the Open Data Certificates. The certificates measure how effectively someone is sharing a dataset for ease of reuse. Open Data Pathway helps organisations assess how well they publish and consume open data, helping build a roadmap for their open data journey.

We are initially launching the tool as an alpha release to help us gain valuable user feedback. The beta version will launch at the end of April, 2015, and will have the functionality to support results sharing and organisation benchmarking.

Please sign up and explore the tool and let us know what you think.

5 ways to be a better open data reuser

Originally published on the Open Data Institute blog. Original URL: https://theodi.org/blog/5-ways-better-open-data-reuse

Open data is still in its infancy. The focus so far has been on encouraging and supporting owners of data to publish it openly. A lot has been written about why opening up data is valuable, how to build business cases for open data sharing, and how to publish data in order to make it easy for people to reuse.

But, while it’s great there is so much advice for data publishers, we don’t often talk about how to be a good reuser of data. One of the few resources that give users advice is the Open Data Commons Attribution-Sharealike Community Norms.

I want to build on those points and offer some more tips and insights on how to use open data better.

1. Take time to understand the data

It almost goes without saying that in order to use data you need to understand it first. But effective reuse involves more than just understanding the structure and format of some data. We are asking publishers to be clear about how their data was collected, processed and licensed. So it’s important for reusers to use this valuable information and make informed decisions about using data.

It may mean that data is not fit for the purpose you intend, or perhaps you just need to be aware of caveats that impact its interpretation. These caveats should be shared when you are presenting your own analysis or conclusions, based on the data.

2. Be open about your sources

Attribution is a requirement of many open licences and reusers should be sure they are correctly attributing their sources. But citation of sources should be a community norm, not just a provision in a licence. Within research communities the norm is to publish data under a CC0 licence, because attribution and citation of data is already well-embedded as a best-practice: every scientific paper has a list of references.

The same principles should apply to the wider open data community. Acknowledging sources not only helps credit the work of data publishers, it also helps to identify widely-used, high-quality datasets.

Consider adding a page to your application that lists both the open source software and open data sources that you’ve used in developing it. The Lanyrd colophon pageprovides one example of how this might look.

3. Engage with the publisher

If you’re using someone’s data, tell them! Every open data publisher is keen to understand who is using their data and how. It’s by identifying the value that comes from reuse of their data that publishers can justify continual (and additional) investment in open data publishing.

Engage with publishers when they ask for examples of how their data is being reused. Provide constructive feedback on the data itself and identify quality issues if you find them. Point to improvements in how the data is published that might help you and others consume it more easily.

If it was hard for you to get in touch with the publisher, encourage them to provide clearer contact details on their website. Getting them to complete an Open Data Certificate will help make this point: you can’t get a Pilot rating unless you provide this information.

If open data is a benefit to your business, then share your story. Evidence of open data benefits provides a positive feedback loop that can help people to unlock more data.

4. Share what you know

In some cases it’s not easy or possible to provide feedback directly to publishers, so share what you learn about working with open data with the wider community.

Do you have some tips about how to consume a dataset? Consider writing a blog to share them. Maybe you can even share some open source code to help work with the data.

Have you identified some issues with a dataset? Those issues may well affect others, so share your observations with the wider community, not just the data publisher.

5. Help build the commons

The open data commons consists of all of the openly licensed and inter-connected datasets that are published to the web. The commons can grow and become more stable if we all contribute to it. There are various ways to achieve this beyond attribution and knowledge-sharing.

For example, if you’ve made improvements to a dataset, perhaps to enrich it against other sources, consider sharing that new dataset under an open licence. This might be the start of a more collaborative relationship with the original publisher or open up new business opportunities.

Some datasets are built and maintained collaboratively. Consider contributing some resources to help maintain the dataset, contributing your fixes or improvements. The more people do this, the more valuable the whole dataset becomes.

Direct financial contributions might also be an option, especially if you’re a commercial organisation making large-scale use of an open dataset. This is a direct way to support open data as a public good.

What do you think?

A mature open data commons will consist of a network of datasets published and reused by a variety of organisations. All organisations will be both publishers and consumers of open data. As we move forward with developing open data culture we need to think about how to encourage and support good practice in both roles.

The suggestions in this blog should prompt further discussion. We’d like to develop this further into some guidance for open data practitioners.

Comparing the 5-star scheme with Open Data Certificates

Originally published on the Open Data Institute blog.

I’ve been asked several times recently about the differences between the 5-star scheme for open data and the Open Data Certificates. How do the two ratings relate to one another, if at all? In this blog post I aim to answer that question.

The 5-star scheme

The 5-star deployment scheme was originally proposed by (our President) Tim Berners-Lee in his linked data design principles. The scheme is neatly summarised in this reference, which also identifies the costs and benefits associated with each stage.

Essentially, the scheme measures how well data is integrated into the web. “1-star” data is published in proprietary formats that users must download and process. “5-star” data can be accessed online, uses URIs to identify the resources in the data, and contains links to other sources.

The scheme is primarily focused on how data is published: the formats and technologies being used. Assessing whether a dataset is published at 2, 3 or 4 stars requires some insight into how the data has been published, which can be difficult for a non-technical person to assess.

The scheme is therefore arguably best used as a technical roadmap and a short-hand assessment of the technical aspects of data publishing.

Open Data Certificates

The Open Data Certificates process takes an alternative but complementary view. A certificate measures how effectively someone is sharing a dataset for ease of reuse. The scope covers more than just technical issues including rights and licensing, documentation, and guarantees about availability. A certificate therefore offers a more rounded assessment of the quality of publication of a dataset.

For data publishers the process of assessing a dataset provides insight into how they might improve their publishing process. The assessment process is therefore valuable in itself, but the certificate that is produced is also of value to reusers.

An Open Data Certificate acts as a reference sheet containing information of interest to reusers of a dataset. This saves time and effort digging through a publishers website to find out whether a dataset can meet their needs. The ability to search and browse for certified datasets may eventually make it easier to find useful data.

Despite these differences, the certificates and the 5-star scheme are in broad alignment. Both aim to improve the quality and accessibility of published data. And both require that data is published under open licences using standard formats. We would expect a dataset published to Expert level on the certificates to be well-integrated into the web, for example.

However it doesn’t necessarily follow that all “5-star” data would automatically gain an Expert rating: a dataset may be well integrated into the web but still be poorly maintained or documented.

In our view the Open Data Certificates provide clearer guidance for data publishers to consider when planning and improving their publishing efforts. They help publishers look at the bigger picture of data-user needs, many of which are not about the data format or whether the data contains URIs. This bigger picture can help inform data publishing roadmaps, procurement of data publishing services and policy development.

The certificates also provide a clear quality mark for reusers looking for assurances around how well data is published.

The 5-star scheme has been very effective at moving publishers away from Excel and closed licences and towards CSV and open licences. But for sustained and sustainable open data, reusers need the publishers of open data to consider more than licences and data formats. The Open Data Certificates helps publishers do that.

Simplifying the UK open data licensing landscape

Originally published on the Open Data Institute blog. Original url: https://theodi.org/blog/simplifying-the-uk-open-data-licensing-landscape

The Ordnance Survey has adopted the Open Government Licence (OGL) as the default licence for all of its open data products. This is great news for the open data community as it simplifies licensing around many important UK open datasets. It’s also an opportunity for other data publishers to reflect on their own approach to data licensing.

The original “OS Open Data licence” was based on a customised version of the first version of the OGL. Unfortunately these changes left the open data community in some doubt about how the new clauses were to be interpreted. For example, the Open Street Map community decided that the terms were incompatible with the Open Database Licence, requiring them to seek explicit permission to use the open data. These are exactly the problems that standard open licences are meant to avoid.

By switching licence the Ordnance Survey has not only resolved outstanding confusion but has also ensured that its data can be freely and easily mixed with other UK Government sources. The knock-on effects will also simplify the licensing of local government data released under the Public Sector Mapping Agreement. The result is a much clearer and simpler open data landscape in the UK.

At the ODI we’ve previously highlighted our concerns around the proliferation of open government licences. Many of these licences have taken a similar approach to the OS Open Data licence and are derived from earlier versions of the OGL.

We think this is a good time for all data publishers to consider their licensing choices:

  • If your custom licence is derived from the OGL then consider adopting the original version unchanged.
  • If you’re using a bespoke licence then consider how adopting a standard licence such as the OGL or the Creative Commons Attribution licence could benefit potential reusers.

For more information you can browse our guidance on open data licensing and our draft guidance on problematic licensing terms.

Ultimately, the simplification of the open data licensing landscape benefits everyone and we ask other publishers to follow the Ordnance Survey’s lead.

Public draft of the open data maturity model

In partnership with the Department for Environment, Food & Rural Affairs (Defra), the ODI has been developing a maturity model to help assess how effective organisations are at publishing and consuming open data.

We are pleased to launch a public draft of the model and invite feedback on it from the wider community.

Last year we announced the start of a project to develop an open data maturity model. Funded through the Release of Data Fund, the project aims to support organisations in mapping out their open data journey and comparing their progress with others. The model will be of immediate value to Defra in implementing its open data strategy, but the aim has always been to develop a model that can be applied by a wide range of organisations.

Since November we’ve run a series of requirements workshops to explore this idea in more detail with representatives from 10 different organisations, including members of the Defra network and the wider open data community.

The results have been used to create a maturity model that will help organisations assess their maturity as both publishers and reusers of open data in several areas:

  • Data management processes
  • Knowledge and skills
  • Customer support and engagement
  • Investment and financial performance
  • Strategy and governance

The draft model consists of two components:

  • An assessment grid that identifies the key elements of the model and the steps towards maturity.
  • An supporting guidance document that provides more detail on the structure of the model, the activities described in the grid and some notes on how to undertake an assessment.

The documents are at a stage where we would like to invite input from the open data community.

We’d welcome all feedback, but are particularly interested in knowing whether:

  • the model covers the right elements of assessing maturity,
  • the guidance includes the right amount of detail and supporting notes, or
  • the results you get from assessing your organisation seem reasonable.

Please read through both documents and let us know your thoughts. It might be useful to read some of the introductory parts of the guide before reviewing the grid and other guidance in more detail.

You can comment on the documents directly or if you’d prefer then email your feedbackto Leigh.Dodds@theodi.org.

Our aim is to deliver a final version of the model by the end of March. So please provide your feedback by Friday, 13 March.

In the meantime, we will be starting the second phase of the project which focuses on developing an assessment tool to support people in using the model.

Developing an open data maturity model

Originally published on the Open Data Institute blog. Original URL: https://theodi.org/blog/developing-an-open-data-maturity-model

Organisational change is an important aspect of becoming an open data publisher. Often the technical process of getting data published is actually the easiest step. But if users are to have reliable, ongoing access to data then organisations need to consider the strategic, financial and operational impacts of making their open data publishing efforts sustainable.

Existing open data publishers are all at different stages in this change process: some are only just beginning to publish data, others have already undergone significant changes towards a more “open by default” model. Understanding the issues commonly encountered provides an opportunity for organisations to both learn from the successes of others and to assess their “maturity” as an open data publisher.

The Defra Transparency Panel, put together by the Department for Environment, Food & Rural Affairs, recently identified the need to be able to assess the open data maturity of organisations Defra works with, with a view to using this as a means to further promote open data publishing. As this type of assessment would clearly have value to other public bodies, Defra has partnered with the ODI to explore the creation of a general “maturity model” for open data publishers.

Funded by the Release of Data Fund, the project is just beginning and has several goals:

  • convene a group of stakeholders from central and local government, and the wider open data community, to provide input into the model
  • create an assessment model that considers the technical, strategic, financial, internal operational, customer and knowledge aspects of open data publication
  • develop a simple tool that will allow organisations to assess themselves against the model, producing a simple scorecard and recommended areas for improvement.

Consisting of a series of questions and measures, the assessment will be a natural complement to the Open Data Certificates. But where the certificates focus on a single dataset, the maturity model will assess the wider organisation. Where possible, data from existing certificates and data.gov.uk measures will be used to help support completion of the assessment.

By providing a means for public bodies to better understand their open data maturity and concrete guidance on areas for improvement, the goal is to ultimately drive an increase in the volume and quality of open data.

First steps

The initial part of the project will consist of a series of workshops. The first of these will focus on requirements gathering with a later workshop providing an opportunity to review and test the model before it is finalised. Development of the assessment tool will then begin.

The team are currently drawing up a shortlist of stakeholders to invite to the workshops, the first of which will take place before the end of the year. The initial set of attendees is based on existing expressions of interest from the Defra Transparency Panel, organisations in the Defra network, DCLG, and some local authorities. The goal is to have a representative mix of different types of organisation with different levels of experience in publishing open data.

While spaces will be limited, if you are interested in taking part in one of the workshops then please send me an email at leigh.dodds@theodi.org as soon as possible. However, the intention is to openly publish both the draft and final models so there will be opportunities for wider review before it is released. We’ll also provide further updates on the project over the coming months.