What is collaborative maintenance of data? A short talk at the Royal Society

Following the publication of their report on data governance in the 21st century, the Royal Society are running a number of workshops to explore data governance in different sectors. In October 2019 year they ran one exploring data governance in the auto insurance sector.

Last week they held a workshop looking at data governance in the civil society sector. The ODI were invited to help out, and I chaired a session looking at collaborative maintenance of data. I believe the Royal Society will be publishing a longer write-up of the workshop over the coming weeks.

This blog post is a written version of a short ten minute talk I gave during the workshop. The slides are public.

Let’s start with a definition. What is collaborative maintenance?

You might already be familiar with terms like “crowd-sourcing” or “citizen science”. Both of those are examples of collaborative maintenance. But it can take other forms too. At the ODI we use collaborative maintenance of data to refer to any scenario where organisations and communities are sharing the work of collecting and maintaining data.

It might be helpful to position collaborative maintenance alongside other approaches that are part of “open culture”. These include open standards, open source, and open data. Let’s look at each of them in turn.

Open standards for data are reusable, shared agreements that shape how we collect, share, govern and use data. There are different types of open standards. Some are technical, and describe file formats and methods of exchanging data. Others are higher-level and capture codes of practices and protocols for collecting data. Open standards are best developed collaboratively, so that everyone impacted by or benefiting from the standard can help shape it.

Open source involves collaborating to create reusable, openly licensed code and applications. Some open source projects are run by individuals or small communities. Others are backed by larger commercial organisations. This collaborative work is different to that of open standards. For example, it involves identifying and agreeing features, writing and testing code and producing documentation to allow others to use it.

Open data is about publishing data under an open licence, so it can be accessed, used and shared by anyone for any purpose. Different communities engage in publication of open data for different purposes.

For example, the open government movement originally focused on open data as a means to increase transparency of governments. More recently there is a shift towards using open data to help address a variety of social, economic and environmental challenges. In contrast, as part of the open science movement, there is a different role for open data. Recent attention has been on the use of open data to address the reproducibility crisis around research. Or to help respond to emerging health issues, like Coronavirus.

With a few exceptions, the main approach to open data has been a single organisation (or researcher) publishing data that they have already collected. There may be some collaboration around use of that data, but not in its collection or maintenance.

This makes open data quite distinct from open source or open sources.

We can think of collaborative maintenance as about taking the approach used in open source and applying it to data. Collaborative maintenance involves collaboration across the full lifecycle of a dataset.

Some examples might be helpful.

OpenStreetMap is a collaboratively produced spatial database of the entire world. While it was originally produced by individuals and communities, it is now contributed to by large organisations like Facebook, Microsoft and Apple. The Humanitarian OpenStreetMap community focuses on the collection and use of data to support humanitarian activities. The community are involved in deciding what data to collect, prioritising maintenance of data following disasters, and mapping activities either on the ground or remotely. The community works across the lifecycle and is self-directing.

Common Voice is a Mozilla project. It aims to build an open dataset to support voice recognition applications. By asking others to contribute to the dataset, they hope to make it more comprehensive and inclusive. Mozilla have defined what data will be collected and the tasks to be carried out, but anyone can contribute to the dataset by adding their voice or transcribing a recording. It’s this open participation that could help ensure that the dataset represents a more diverse set of people.

Edubase is maintained by the Department for Education (DfE). It’s our national database of schools. It’s used in a variety of different applications. Like Mozilla, DfE are acting as the steward of the data and have defined what information should be collected. But the work of populating and maintaining the shared directory is carried out by people in the individual schools. This is the best way to keep that data up to date. Those who are know when the data has changed have the ability to update it. The contributors all benefit from shared resource.

Build a shared directory is a common use for collaborative maintenance. But there are others.

Looking across these projects and other examples that we’ve studied in our desk and user research, we can see that there are different ways we can collaborate around data.

For example, we can work together to decide what data to collect. We can share the work of collecting and maintaining data, ensuring its quality and governing access to it. We can use open source to help to build the tools to support those communities.

We’ve developed the collaborative maintenance guidebook to help support the design of new services and platforms. It includes some background and a worked example. The bulk of the guidebook is a set of “design patterns” that describe solutions to common problems. For example how to manage quality when many different people are contributing to the same dataset.

We think collaborative maintenance can be useful in more projects. For civil society organisations collaborative maintenance might help you engage with communities that you’re supporting to collect and maintain useful data. It might also be a tool to support collaboration across the sector as a means of building common resources.

The guidebook is at an early stage and we’d love to get feedback on it contents. Or help you apply it to a real-world project. Let us know what you think!