Discussion document: archiving open data

This is a brief post to highlight a short discussion document that I recently published about archiving open data.  The document is intended to help gather ideas, suggestions and best practices around archiving open data to the Internet Archive. The goal being to gather together useful guidance that can help encourage archiving and distribution of open data from existing portals, frameworks, etc.

This isn’t an attempt to build a new standard, just encourage some convergence and activity. At present the guidance recommends building around the Data Package specification as it is simple and provides a well-defined unit (a zip file) for archiving purposes.

Archiving data can help build resilience in the open data commons providing backups of important data resources. This will help deal with:

  • Unexpected system outages that could take down data portals
  • Decisions by publishers to remove data previously published under an open licence, ensuring an original copy remains
  • Decisions by publishers to take down data
  • Services and portals permanently going offline

If you have thoughts or suggestions then feel free to add them to the document. It would particularly benefit from input from those in the archival community and especially those who are already familiar with working with the Internet Archive.

I hope to build a small reference implementation to illustrate the idea and help to archive the data from Bath: Hacked.

What 3 Words? Jog on mate!

The OpenAddresses.io website notes that “Address data is essential infrastructure“. Geography underpins so much of the data we collect and is collected about us, making address registers important parts of national data infrastructure.

In the UK we’ve been wrestling with the fact that our address register is not open for many years. After the decision to sell the register as part of the privatisation Royal Mail money has been spent on exploring the creation of an open alternative. But it’s looking positive that we may end up getting a free, open version albeit at the cost of another £5m.

What3Words is a UK startup that also recognises the importance of address registers. Their website notes that: “Poor addressing costs businesses billions of dollars and hampers the growth and development of entire nations.

The company has developed an algorithm to assign unique 3 word identifiers to the entire world, creating a global addressing system. The website does a great job of explaining why improving addresses globally is important and highlights the benefits it can bring.

The problem is that What3Words is a proprietary, closed system. The algorithm is patented. The data is closed, with the terms and conditions spelling out in great detail all of the things you can’t do with the system, including:

  • You must not pre-fetch, cache, index, copy, re-utilise, extract or store any what3words Data
  • You may store What3words Data solely for the purpose of improving Your implementation of the API into Your Product provided that such storage: (i) is temporary (and in no event lasts for more than 30 calendar days), (ii) is limited to an amount of What3words Data which is strictly required to improve Your API implementation, (iii) is secure, and (iv) shall in no event enable You or a third party to use the what3words Data outside of Your Products, in any way, or to re-utilise or extract such data
  • For the avoidance of doubt, You must not use any what3words Data (whether accessed from the API or otherwise) for any purposes not expressly permitted under this Agreement, including for Your own use or for distribution, licence or sale to any third-party
  • ..etc, etc

These are all characteristics that help to make What3Words a good prospect for investment: all the defensive walls are in place to protect their intellectual property.

But these are also all characteristics that make What3Words completely unsuitable as either a global or national address register. So I was dismayed to read that Mongolia have decided to adopt it as their national register. I’m hoping that this isn’t really the case and that story is similar to the apocryphal tales of Honduras’s blockchain based land registry.

Clearly Mongolia is in need of a better data infrastructure and I can understand why a system like What3Words would be attractive. But I think the closed nature of the platform makes it a poor foundation for future growth. While the service might be great for parcel delivery, address and location information is used in so many other ways.

The licensing restrictions mean that its not possible to publish open data to help shed transparency on land ownership, report on crisis mapping, collect and process census or other statistics, and a myriad of other use cases. You can’t even store the data for your own re-use, other than on a temporary basis.

With this in mind I’d find it hard to recommend that any organisation collecting and sharing data should use What3Words. Otherwise the keys to your dataset are tied up with the intellectual property and API licensing of a third party. With terms that can be changed at any time. NGOs and other organisations hoping to publish open data about their activities should approach the service with a great deal of caution.

The fix for all this would be simple: What3Words could publish their data and algorithm under an open licence. I think that’s unlikely though.

Being an idealist I’d like to think that more data startups will start to recognise their role in contributing to a global commons and design products accordingly. And perhaps what we need is not more startup incubators, but institutions that will support the creation of data infrastructure that builds a more open future.

Beyond Publishers and Consumers: Some Example Ecosystems

Yesterday I wrote a post suggesting that we should move beyond publishers and consumers and recognise the presence of a wider variety of roles in the open data ecosystem. I suggested a taxonomy of roles as a starting point for discussion.

In this post I wanted to explore how we can use that taxonomy to help map and understand an ecosystem. Eventually I want to work towards a more complete value network analysis and some supporting diagrams for a few key ecosystems. But I wanted to start with hopefully simple examples.

As I’ve been looking at it recently I thought I’d start by examining Copenhagen’s open data initiative and their city data marketplace.

What kind of ecosystems do those two programmes support?

The copenhagen open data ecosystem

The open data ecosystem can support all of the roles I outlined in my taxonomy:

  • Steward: The city of Copenhagen is the steward of all (or the majority of) the datasets that are made available through its data platform, e.g. the location of parking meters
  • Contributor: The contributors to the dataset are the staff and employees of the administration who collect and then publish the data
  • Reuser: Developers or start-ups who are building apps and services, such as I Bike CpH using open data
  • Beneficiary: Residents and visitors to Copenhagen

Examples of the tangible value being exchanged here are:

  • (Steward -> Reuser) The provision of data from the Steward to the Reuser
  • (Reuser -> Beneficiary) The provision of a transport application from the Reuser to the Beneficiary

Examples of the intangible value are:

  • (Contributor -> Steward) The expertise of the Contributors offered to the Steward to help manage the data
  • (Beneficiary -> Reuser) The market insights gained by the Reuser which may be used to create new products
  • (Reuser -> Steward) The insights shared by the Reuser with the Steward into which other datasets might be useful to release or improve

In addition, the open licensing of the data enables two additional actors in the ecosystem:

  • Intermediaries: who can link the Copenhagen data with other datasets, enrich it against other sources, or offer value added APIs. Services such as TransportAPI.
  • Aggregators: e.g. services that aggregate data from multiple portals to create specific value-added datasets, e.g. an aggregation of census data

In this case the Intermediaries and Aggregators will be supporting their own community of Reusers and Beneficiaries. This increases the number of ways in which value is exchanged.

The copenhagen city data marketplace

The ecosystem around the city data marketplace is largely identical to the open data ecosystem. However there are some important differences.

  • Steward: The city of Copenhagen is not the only Steward, the goal is to allow other organisations to publish their data via the marketplace. The marketplace will be multi-tenant.
  • Intermediary: the marketplace itself has become an intermediary, operated by Hitachi
  • The ecosystem will have a greater variety of Contributors, reflecting the wider variety of organisations contributing to the maintenance of those datasets.
  • Reusers and Beneficiaries will be present as before

In addition, because the marketplace offers paid access to data, there are other forms of value exchange, e.g. exchange of money for services (Reuser -> Intermediary).

But the marketplace explicitly rules out the Intermediary and Aggregator roles. Services like TransportAPI or Geolytix could not build their businesses against the city data marketplace. This is because the terms of use of the market prohibit onward distribution of data and the creation of potentially competitive services.

In an effort to create a more open platform to enable data sharing, the result has been to exclude certain types of value exchange and value-added services. The design of the ecosystem privileges a single Intermediary: in this case Hitachi as operator of the platform.

Time will tell whether this is an issue or not. But my feeling is that limiting certain forms of value creation isn’t a great basis for encouraging innovation.

An alternative approach would be to have designed the platform to be part of the digital commons. For example, by allowing Stewards the choice of adding data to the platform under an open licence would give space for other Intermediaries and Aggregators to operate.

Let me know if you think this type of analysis is useful!



Beyond publishers and consumers

In the open data community we tend to focus a lot on Publishers and Consumers.

Publishers have the data we want. We must lobby or convince them that publishing the data would be beneficial. And we need to educate them about licensing and how best to publish data. And we get frustrated when they don’t do those things

Consumers are doing the work to extract value from data. Publishers want to encourage Consumers to do things with their data. But are often worried that Consumers aren’t doing the right things or that they’re not able to track when that value is being created.

While these two roles clearly exist I’m increasingly of the opinion that the framing isn’t a helpful one. There are several reasons why I think that’s the case.

Firstly, as Jeni has already noted, it ties us in knots. By identifying ourselves with one or other role we create a divide. This inevitably leads us to focusing on our own perspective, our own needs, and sets expectations of what others must do or should do before we can act. And yet we know that:

  • organisations are often publishers and consumers of their own data. Doing it better helps themselves and not just others
  • to solve big, complex challenges we need to collaborate. Collaboration doesn’t happen when a team is divided and we don’t have shared goals

Secondly, I worry that by framing discussions in terms of Publisher and Consumer we are overlooking opportunities for more collaborative activities. Focusing on Publisher and Consumer leads us to think in narrow perspectives of what an open data infrastructure might look like. I’ve previously highlighted some of the differences between the current state of open source and open data.

I’d suggest that people in the “open source community” think of themselves as contributors, not “publishers of open source software” and “consumers of open source software”.  See also, Open Street Map and other community led projects. Members of the community may tend to fall into specific roles but there’s a shared goal.

Thirdly, its just not representative of the current open data market, let alone what it might look like in future. There are already a number of different types of actors in the landscape. We should make more of an effort to recognise those roles and map out the value they add to the network.

I think it could also help us clarify a number of conversations that relate to “ownership” of data, particularly personal data and how it’s collected and reused.

So, as ground work for an exercise I’d like to do in mapping out an open data value network, here’s a proposed re-framing that recognises a number of different roles.

In each case it might be an individual or an organisation that’s fulfilling the role. And, in any given interaction, its possible that the same person or organisation might be fulfilling multiple roles.

This is far from polished, but thought I’d share an early draft. Add a comment to let me know what you think.

  • Steward – has responsibility for managing and ensuring access to a dataset. Covers at least the infrastructure supporting the ongoing collection and access to the data. The Steward role could also include responsibilities for managing contributions, e.g. to ensure data quality.
    • Examples: ORCID, Bath: Hacked, “Data Controller” role as defined by ICO
  • Registrar – a specific type of Steward who is responsible for assigning and managing key identifiers and reference data used in other datasets
    • Examples: CrossRef
  • Contributor – responsible for adding, updating or curating data in a dataset, using the tools and infrastructure provided by the Steward.
    • Examples: OSM editors, MusicBrainz contributors, Waze contributors, scholarly publishers adding to CrossRef
  • Reuser – makes use of one or more open datasets to create applications, analysis, etc.
    • Examples: data journalists, City Mapper, “Data Processor” role as defined by ICO
  • Intermediary – provides value added services that wrap, host or enrich a dataset. E.g. visualisation tools, APIs, etc.
    • Examples: Socrata, Data Press, Transport API
  • Aggregator – a specific form of Intermediary that packages together datasets from other sources
  • Beneficiary – benefits from the activity of reusers, e.g. by consuming packaged analyses or other applications
    • Examples: City Mapper users, UK citizen
  • Subject – a person or organisation who is the subject of a dataset or data item. E.g. the person contributing to a health dataset

While the Reuser role is actually the same as the “Consumer” role I referenced earlier, this framing breaks down Publisher into a number of smaller roles which hopefully better highlight some of the interactions that we tend to overlook. I’ve also tried to tease out some of the responsibilities of tool and platform vendors that help support the ecosystem.

If current data publishers began to think of themselves as Stewards of data, then would this let us have a better discussion about ways to enable more Contributors?

Can we make a case that open data publishing should be as lightweight as possible, to simplify the Stewards role, whilst enabling a marketplace of Intermediaries?

Can we better recognise the role of Registrars in creating the web of data?

Would separating out the needs of the Beneficiaries of data from those of Reusers help distinguish between technology user needs and broader needs around data literacy?

Let me know what you think. What else should go on this list?



Designing for the open digital commons

I wanted to share some thinking I’ve been doing around how to create products and services that embrace and support the digital commons. The digital commons is the growing wealth of openly licensed content and data that is now available to us all.

In order to benefit from the commons we need to look after it. Individual communities are best placed to manage the commons resources that matter to them most. And other organisations like the Internet Archive have a broader role in preserving the commons as a whole.

I think there are benefits, when designing a product or service, in spending a bit of time thinking about how it might interact with the commons. For example could it provide a means for users to contribute to the commons by adding more content, or by curating material that is already there, or enabling them to discover and use openly licensed resources.

Why build for the commons?

The commons is a public good. It’s a repository of cultural artefacts, educational resources, scientific research and more. In that light I would expect start-ups and other organisations to want to support the commons. To recognise that its a resource to be managed and shared for all.

However that’s arguably not enough. There are bottom-lines to think about. This means that there should be some business benefits for why you might want to build with the commons in mind. I think that amongst those benefits are that:

  • using and supporting openly licensed material demonstrates how seriously you value the choice and rights of your users. This can build confidence and trust in what you are creating
  • by clearly indicating your commitment to preserve user contributed content, it can encourage users to engage with new tools, platforms and services. No one needs or wants to migrate their content through another incredible journey
  • the commons provides a wealth of free resources that can be used to populate new platforms or provide raw materials for users working with your new tools. More users, more case studies, more impact are always good

There are undoubtedly many more reasons. The creative commons have started to capture how adding an element of openness can help build a business model.

Designing for the commons

But my goal here is isn’t to capture why open is good or why it can be good for business. What I want to do is to start a discussion about what thinking about the commons might mean in terms of how you design a product.

Here’s an initial list of principles. Let me know what you think is missing.

Allow users to export all of their own content and data

This principle isn’t specifically about the commons, but it forms the basis upon which users have full control over their own content and data.

It enables users to easily migrate between tools and platforms that allow them to publish, host or reuse their own material.

It enables self-archiving and sharing of open materials outside of your system.

Allow users to choose how to licence their own content, using a standard licence

The commons is built around standard open licences. Where possible, provide users with a choice of one of the standard Creative Commons licences.

If you’re building a specifically open platform, then you may only want to allow users to publish content if they’re doing so openly. See, for example services like Figshare, which requires users to publish their research data openly.

If you’re building a commercial platform, but are providing a free tier that allows users to publish or host content if it is public, then consider allowing or requiring them to apply an open licence to that material.

Open licences can also simplify your terms and conditions as they grant you the rights you require to store, distribute, and archive content in order to run your service. You don’t need users to grant you additional rights to host or store it to deliver a service.

Other considerations here are giving some thought to whether you can help users to:

  • set a sensible default licence for any material you are helping them to host or publish
  • manage licences of already published material
  • describe and share how they would like material to be attributed

Allow users to easily import and use openly licensed content

Allow people to draw on the commons when using your service or tool. For example if you’re building a data visualisation platform, help them discover openly licensed data that others have shared. Similarly if they’re publishing written content on your service, help them find openly licensed images or other material they can reuse.

Helping users to quickly gain value from your specific tool or service will benefit both you and the user.

Where you do enable reuse of openly licensed material, do what you can to ensure that attribution is properly handled. For example, as a minimum, allow a user to include an attribution statement when reusing some data or an image. If you can do this automatically, by drawing on machine-readable metadata about that content, then you should do so.

If you’re a hosting platform you might also want to think about how you can help users identify where their openly licensed material is being reused. This encourages a virtuous circle of sharing.

Allow all users to easily find openly licensed material you’re hosting

As a corollary to the above, allow users to easily find any openly licensed material that you’re hosting. At a minimum, ensure that the licence associated with content are clearly displayed.

If you’re providing a search or browse function then a useful additional would be adding a means to filter content based on licence. Flickr’s Creative Commons search remains an excellent example of this type of feature.

Allow the web to happen

Platforms should have “soft edges” that allow users to build links between material online. This might include ability to add rich text with links rather than just simple text boxes. Recognise that the rest of the web exists and that users will want to find their own way to bring together a variety of resources.

Allow users the freedom to collaboratively manage open resources

A thriving commons requires an engaged community who are actively contributing and benefiting from its resources. This involves not just publishing and reuse, but also collaborative management.

A good platform will recognise that users may have their own ideas and opinions about how best to organise material. Lightweight tagging and metadata tools can encourage this type of collaborative maintenance. Read “Fan Is A Tool Using Animal” for more on why community maintenance is something you might want to enable and encourage. Avoid trampling on community activity of this type.

You might also consider how users can transfer rights or ownership in material you’re hosting. Perhaps they’d like to step away from managing a resource but allow someone else to take over its stewardship.

Ensure that openly licensed materials are archived

Don’t leave archiving of content until you’re at the end of your incredible journey. Think about shutdownability.

If content and data is openly licensed then ensure that its preserved outside of your system. You don’t need extra permission to do this if there’s an open licence attached to it, but being transparent with users would be a good thing.

Support freemium or sponsorship models

Many platforms have a business model that is tied to the consumption of resources, e.g. a data marketplace might charge users accessing data via its APIs. Or a video platform might charge users for streaming of content.

Models that focus on use rather than hosting can encourage growth of the commons as it allows anyone to contribute.  Re-users then pay for the service level they need to access those sources. Freemium models are also common, allowing some low-level of reuse without incurring extra costs. This supports the commons and allows users to experiment (“try before you buy”) with the services you’re offering.

But a publisher might also want to make that data or content completely available for free at the point of use, e.g. by sponsoring or otherwise covering the costs of downstream users. This provides more options for how publishers want to distribute open resources.

Avoid adding terms and conditions that constrain use of openly licensed material

In short, avoid adding terms and conditions to your service that constrains how users can use the content or data that have been published under an open licence. E.g. additional defensive terms that say that material can only be used for personal use or cannot be redistributed.

Consider what you can contribute to the commons

Finally, even if the users of your product or service aren’t directly contributing material to the commons, think about how openness, e.g. open data, might be an element in your business model. For example sharing openly licensed metadata might drive discovery of material in your platform. Many sites are already publishing rich metadata to improve SEO. Adding an open licence to that material is a small step forward.

First Impressions of Copenhagen’s City Data Exchange

Copenhagen have apparently launched their new City Data Exchange. As this is a subject which is relevant to my interests I thought I’d publish my first impressions of it.

The first thing I did was to read the terms of service. And then explore the publishing and consuming options.

Current Contents

As of today 21st May there are 56 datasets on the site. All of them are free.

The majority seem to have been uploaded by Hitachi and are copies of datasets from Copenhagen’s open data portal.

Compare, for example this dataset on the exchange and the same one on the open data portal. The open version has better metadata, clearer provenance, more choice of formats and a download process that doesn’t require a registration step. The open data portal also has more datasets than the exchange.

Consuming Data

Datasets on the exchange can apparently be downloaded as a “one time download” or purchased under a subscription model. However I’ve downloaded a few and the downloads aren’t restricted to being one-time, at least currently.

I’ve also subscribed to a free dataset. My expectation was that this would give me direct access to an API. It turns out that the developer portal is actually a completely separate website. After subscribing to a dataset I was emailed with a username and password (in clear text!) with instructions to go and log into that portal.

The list of subscriptions in the developer portal didn’t quite match what I had in the main site, as one that I’d cancelled was still active. It seems you can separately unsubscribe to them there, but its not clear what the implications of that might be.

Weirdly there’s also a prominent “close your account” button in the developer portal. Which seems a little odd. Feels like two different products or services have been grafted together.

The developer portal is very, very basic. The APIs expose by each dataset are:

  • a download API that gives you the entire dataset
  • a “delta” API that gives you changes made between specific dates.

There are no filtering or search options. No format options. Really there’s very little value-add at all.

Essentially the subscribing to a dataset gives you a URL from which you can fetch the dataset on a regular basis rather than having to manual download it. There’s no obvious help or support for developers creating useful applications against these APIs.

Authorising access to an API is done via an API key which is added as a URL parameter. They don’t appear to be using OAuth or similar to give extra security.

Publishing Data

In order to publish data you need to have provided a contact phone number and address. You can then provide some basic configuration for your dataset:

  • Title
  • Description
  • Period of update: one off, hourly, daily, weekly, monthly, annual
  • Whether you want to allow it to be downloaded and if so, whether its free or paid
  • Whether you want to allow API access and if so, whether its free or paid

Pricing is in Kronor and you can set a price per download or a monthly price for API access (such as it is).

To provide your data you can either upload a file or give the data exchange access to an API. It looks like there’s an option to discuss how to integrate your API with their system, or you can provide some configuration options:

  • Type – this has one option “Restful”
  • Response Type – this has one option “JSON”
  • Endpoint URL
  • API Key

When uploading a dataset, you can tell it a bit about the structure of the data, specifically:

  • Whether it contains geographical information, and which columns include the latititude and longitude.
  • Whether it’s a time series and which column contains the timestamp

This is as far as I’ve tested with publishing, but looks like there’s a basic workflow for draft and published datasets. I got stuck because of issues trying to publish and map a dataset that I’d just downloaded from the exchange itself.

The Terms of Service

There are a number interesting things to note there:

Section 7, Payments: “we will charge Data Consumers Service Delivery Charges based on factors such as the volume of the Dataset queried and downloaded as well as the frequency of usage of the APIs to query for the Datasets

It’s not clear what those service delivery charges will be yet. The platform doesn’t currently provide access to any paid data, so I can’t tell. But it would appear that even free data might incur some charges. Hopefully there will be a freemium model?

Seems likely though that the platform is designed to generate revenue for Hitachi through ongoing use of the APIs. But if they want to raise traffic they need to think about adding a lot more power to the APIs.

Section 7, Payments: As a Data Consumer your account must always have a positive balance with a minimum amount as stated at our Website from time to time

Well, this isn’t currently required during either registration or signing up to subscribe to an API. However I’m concerned that I need to let Hitachi hold money even if I’m not actively using the service.

I’ll also note that in Section 8, they say that on termination, “Any positive balance on your account will be payable to you provided we receive payment instructions.” Given that the two payment options are Paypal and Invoice, you’d think they might at least offer to refund money via PayPal for those using that option.

Section 8, Restrictions in use of the Services or Website: You may not “access, view or use the Website or Services in or in connection with the development of any product, software or service that offers any functionality similar to, or competitive with, the Services

So I can’t, for example, take free data from the service and offer an alternative catalogue or hosting option? Or provide value-added services that enrich the freely available datasets?

This is pure protecting the platform, not enabling consumers or innovation.

Section 12, License to use the Dataset: “Subject to your payment of any applicable fees, you are granted a license by the Data Provider to use the relevant Dataset solely for the internal purposes and as otherwise set out under section 14 below. You may not sub-license such right or otherwise make the Dataset or any part thereof available to third parties.

Data reuse rights are also addressed in Section 13 which includes the clause: “You shall not…make the Dataset or any part thereof as such available to any third party.

While Section 14, explains that as a consumer you may “(i) copy, distribute and publish the result of the use of the Dataset, (ii) adapt and combine the Dataset with other materials and (iii) exploit commercially and noncommercially” and that: “The Data Provider acknowledges that any separate work, analysis or similar derived from the Dataset shall vest in the creator of such“.

So, while they’ve given clearly given some thought to the creation of derived works and products, which is great, the data can only be used for “internal purposes” which are not clearly defined especially with respect to the other permissions.

I think this precludes using the data in a number of useful ways. You certainly don’t have any rights to redistribute, even if the data is free.

This is not an open license. I’ve written about the impacts of non-open licenses. It appears that data publishers must agree to these terms too, so you can’t publish open data through this exchange. This is not a good outcome, especially if the city decides to publish more data here and on its open data portal.

The data that Hitachi have copied into the site is now under a custom licence. If you access the data through the Copenhagen open data portal then you are given more rights. Amusingly, the data in the exchange isn’t properly attributed, so it break the terms of the open licence. I assume Hitachi have sought explicit permission to use the data in this way?

Overall I’m extremely underwhelmed by the exchange and the developer portal. Even allowing for it being at an early stage, its a very thin offering.I built more than this with a small team of a couple of people over a few months.

It’s also not clear to me how the exchange in its current form is going to deliver on the vision. I can’t see how the exchange is really going to unlock more data from commercial organisations. The exchange does give some (basic) options for monetising data, but has nothing to say about helping with all of the other considerations important to data publishing.

Dungeons and Dragons and Data

I’ve run a number of presentations recently introducing teams at various organisations to the Open Data Maturity Model. A number of organisations are starting to apply the model to help them benchmark and improve their open data practice. It’s being widely used across Defra here in the UK and ODI Queensland have turned it into a series of workshops to help public sector organisations in Australia.

It can get a bit boring running the same sessions repeatedly, so I often look for a different way to approach things. The maturity model also covers a lot of ground and there are some elements, such as having a good open data policy, a standard release process, and a data asset catalogue which I think are more foundational.

So, when I recently ran a session to introduce the model to some ODI staff I decided to try a different approach. We played Dungeons and Dragons and Data. Here’s how I ran the session.

Goals and structure

The aim was to introduce attendees to the basics of the model but do it in a fun way.I hoped to introduce some key areas that could quickly improve an organisations open data practice.

The ODI are a geeky bunch, so I decided to frame it in terms of Dungeons and Dragons. Instead of improving our open data maturity scores, we were creating characters, levelling up their skills and collecting treasure!

The session consisted of:

  • introducing the characters skills (maturity model activities)
  • choosing names for their characters
  • asking them to fill in their character sheet with their stats (the results of a maturity assessment)
  • going on an adventure, with me as the Dungeon Master, to work through a few exercises as small adventuring teams
  • handing out treasure to the teams that did the best

The adventure was intended to:

  • introduce the idea of a data asset register and start populating one for the organisation
  • sketch out a light-weight data release process
  • identify existing data that was public or open whose publication could be improved
  • decide on the next steps, e.g. how to introduce this to a wider audience (slay the dragon!)

Feel free to adapt or reuse the Dungeons and Dragons and Data slides. Here’s a copy of the adventure map and also a version with runes. And yes, the runes on the map do actually translate.

And here are the Dungeon Master’s notes for each step of the adventure

The Wizards Library (Asset Catalogue)

You meet a wizard who has a vast library of spells (datasets) collected from across the land. They’ve been collected by his team of apprentices and acquired from adventurers such as yourselves. Unfortunately a band of goblins broke into the library and, whilst looting, made off with the scrolls that identified which spell was which.

The wizard knows that there are some minor spells which anyone could read. But some that are very dangerous and people shouldn’t have access to. He needs your help to organise his collection.

An asset catalogue needs to be of use to the organisation to:

  • inventory its data
  • identify areas of overlap
  • identify areas of improvement
  • identify owner
  • identify source

It’s not really a data catalogue as its intended to support strategic decisions, so needs a slightly different collection of metadata. Divide the group into teams of four.

Tasks. Ask them to:

  • Think about what needs to go into the catalogue? Say, 5 minutes
  • Collaborate around a template catalogue you’ve provided already. Ask them to populate the catalogue and add additional fields that they identified but which aren’t in yours. Say, 10 minutes
  • Then have a discussion about whether they think the catalogue is useful and how.

Rewards: 100gp each for helping the wizard. Scroll of open data impact for the team cataloguing the most datasets.


The Maze of Governance (Data Release Process)

After continuing your journey you see high stone walls barring your path down a rocky ravine. It’s the vast maze of governance which traps unwary travellers. From inside you can hear people who are lost and trying to find their way out.

Your realise that from a vantage point on the side of the valley that you could call out to the people in the maze to guide them to safety.

Can you help release the lost travellers and also chart your own course through the maze?

A good release process will help with prioritisation, have review stages and sign-off where necessary. Involve checking of the data before its release and a plan for ongoing release of that data. It should be clear where the organisation will publish data, e.g. to a website, portal, github by default.


  • Discussion: What information do we need to use to decide whether to prioritise a data release? (& is that information in the catalogue?). Say, 5 mins
  • In team, ask them to sketch out a lightweight approach for releasing data in a shared document. Say, 10 mins

Rewards: 500gp each for helping people through the maze. A grateful adventurer presents the team with the most complete process with a +10 sword of PDF cleaving each.

The Thorns of Curation (Improving Data Publication)

Further down the road you meet a a sad looking man. He’s the head gardener of the Garden of Thorns. His once beautiful garden has run out of control, and the King is shortly to visit.
It turns out its hard to stay on top of a garden that is entirely made of thorn bushes. And these are magical thorns: prick your finger and you fall asleep. And once planted they can’t be cut down.

Can you suggest some ways for the gardener to tidy his garden and make it look presentable?

This portion of the adventure works for organisations that have released some data, but want to get better at it.

Tasks: Using the asset catalogue, look over how data has been published.

  • Identify 3 actions that can be used to improve how the data is published. The constraint is that you can’t add more technology: no creation of APIs or infrastructure, focus on things like metadata, documentation, etc.
  • Identify whether there are discrepancies, e.g. are they using a common release process? Do they have open data certificates? Is there a common default licence?

Rewards: The gardener gives everyone 100gp each. The best helpers will each get a +5 Staff of Metadata curation.

Slaying the Dragon

The final part of the adventure is slaying the dragon, Tiamat. Tiamat has many heads, each of which breathe a different type of flame. Fighting the dragon involves identifying which head to cut off first.


  • Discuss what the next steps will be, e.g. how to get others to start using the catalogue?
  • Which skills (activities) should the organisation be trying to level up first?

Rewards: 1000gp each from Tiamat’s treasure vault. A few lucky people will also find a +5 Helm of Awesome Suggestion


OK, so this is might not work for all audiences! But the structure here could be easily adapted to cover additional steps in the journey, based on the needs of the organisation. It also offers a way to engage people in thinking about data management, governance and release in a slightly more creative way.

Reuse and adapt as you see fit!