OSM Queries

For the past month I’ve been working on a small side project which I’m pleased to launch for Open Data Day 2021.

I’ve long been a fan of OpenStreetMap. I’ve contributed to the map, coordinated a local crowd-mapping project and used OSM tiles to help build web based maps. But I’ve only done a small amount of work with the actual data. Not much more than running a few Overpass API queries and playing with some of the exports available from Geofabrik.

I recently started exploring the Overpass API again to learn how to write useful queries. I wanted to see if I could craft some queries to help me contribute more effectively. For example by helping me to spot areas that might need updating. Or identify locations where I could add links to Wikidata.

There’s a quite a bit of documentation about the Overpass API and the query language it uses, which is called OverpassQL. But I didn’t find them that accessible. The documentation is more of a reference than a tutorial.

And, while, there’s quite a few example queries to find across the OSM wiki and other websites, there isn’t always a great deal of context to those examples that explain how they work or when you might use them.

So I’ve been working on two things to address what I think is a gap in helping people learn how to get more from the OpenStreetMap API.

overpass-doc

The first is a simple tool that will take a collection of Overpass queries and build a set of HTML pages from them. It’s based on a similar tool I built for SPARQL queries a few years ago. Both are inspired by Javadoc and other code documentation tools.

The idea was to encourage the publication of collections of useful, documented queries. E.g. to be shared amongst members of a community or people working on a project. The OSM wiki can be used to share queries, but it might not always be a suitable home for this type of content.

The tool is still at quite an early stage. It’s buggy, but functional.

To test it out I’ve been working on my own collection of Overpass queries. I initially started to pull together some simple examples that illustrated a few features of the language. But then realised that I should just use the tool to write a proper tutorial. So that’s what I’ve been doing for the last week or so.

Announcing OSM Queries

OSM Queries is the result. As of today the website contains four collections of queries. The main collection of queries is a 26 part tutorial that covers the basic features of Overpass QL.

By working through the tutorial you’ll learn:

  • some basics of the OpenStreetMap data model
  • how to write queries to extract nodes, ways and relations from the OSM database using a variety of different methods
  • how to filtering data to extract just the features of interest
  • how to write spatial queries to find features based on whether they are within specific areas or are within proximity to one another
  • how to output data as CSV and JSON for use in other tools

Every query in the tutorial has its own page containing an embedded syntax highlighted version of the query. This makes them easier to share with others. You can click a button to load and run the query using the Overpass Turbo IDE. So you can easily view the results and tinker with the query.

I think the tutorial covers all the basic options for querying and filtering data. Many of the queries include comments that illustrate variations of the syntax, encouraging you to further explore the language.

I’ve also been compiling an Overpass QL syntax reference that provides a more concise view of some of the information in the OSM wiki. There’s a lot of advanced features (like this) which I will likely cover in a separate tutorial.

Writing a tutorial against the live OpenStreetMap database is tricky. The results can change at any time. So I opted to focus on demonstrating the functionality using mostly natural features and administrative boundaries.

In the end I chose to focus on an area around Uluru in Australia. Not just because it provides an interesting and stable backdrop for the tutorial. But because I also wanted to encourage a tiny bit of reflection in the reader about what gets mapped, who does the mapping, and how things get tagged.

A bit of map art, and a request

The three other query collections are quite small:

I ended up getting a bit creative with the MapCSS queries.

For example, to show off the functionality I’ve written a query that shows the masonic symbol hidden in the streets of Bath, styled Brøndby Haveby like a bunch of flowers and the Lotus Bahai Temple as, well, a lotus flower.

These were all done by styling the existing OSM data. No edits were done to change the map. I wouldn’t encourage you to do that.

I’ve put all the source files and content for the website into the public domain so you’re free to adapt, use and share however you see fit.

While I’ll continue to improve the tutorial and add some more examples I’m also hoping that I can encourage others to contribute to the site. If you have useful queries that you could be added to the site then submit them via Github. I’ve provided a simple issue template to help you do that.

I’m hoping this provides a useful resource for people in the OSM community and that we can collectively improve it over time. I’ve love to get some feedback, so feel free to drop me an email, comment on this post or message me on twitter.

And if you’ve never explored the data behind OpenStreetMap then Open Data Day is a great time to dive in. Enjoy.

What is collaborative maintenance of data? A short talk at the Royal Society

Following the publication of their report on data governance in the 21st century, the Royal Society are running a number of workshops to explore data governance in different sectors. In October 2019 year they ran one exploring data governance in the auto insurance sector.

Last week they held a workshop looking at data governance in the civil society sector. The ODI were invited to help out, and I chaired a session looking at collaborative maintenance of data. I believe the Royal Society will be publishing a longer write-up of the workshop over the coming weeks.

This blog post is a written version of a short ten minute talk I gave during the workshop. The slides are public.

Let’s start with a definition. What is collaborative maintenance?

You might already be familiar with terms like “crowd-sourcing” or “citizen science”. Both of those are examples of collaborative maintenance. But it can take other forms too. At the ODI we use collaborative maintenance of data to refer to any scenario where organisations and communities are sharing the work of collecting and maintaining data.

It might be helpful to position collaborative maintenance alongside other approaches that are part of “open culture”. These include open standards, open source, and open data. Let’s look at each of them in turn.

Open standards for data are reusable, shared agreements that shape how we collect, share, govern and use data. There are different types of open standards. Some are technical, and describe file formats and methods of exchanging data. Others are higher-level and capture codes of practices and protocols for collecting data. Open standards are best developed collaboratively, so that everyone impacted by or benefiting from the standard can help shape it.

Open source involves collaborating to create reusable, openly licensed code and applications. Some open source projects are run by individuals or small communities. Others are backed by larger commercial organisations. This collaborative work is different to that of open standards. For example, it involves identifying and agreeing features, writing and testing code and producing documentation to allow others to use it.

Open data is about publishing data under an open licence, so it can be accessed, used and shared by anyone for any purpose. Different communities engage in publication of open data for different purposes.

For example, the open government movement originally focused on open data as a means to increase transparency of governments. More recently there is a shift towards using open data to help address a variety of social, economic and environmental challenges. In contrast, as part of the open science movement, there is a different role for open data. Recent attention has been on the use of open data to address the reproducibility crisis around research. Or to help respond to emerging health issues, like Coronavirus.

With a few exceptions, the main approach to open data has been a single organisation (or researcher) publishing data that they have already collected. There may be some collaboration around use of that data, but not in its collection or maintenance.

This makes open data quite distinct from open source or open sources.

We can think of collaborative maintenance as about taking the approach used in open source and applying it to data. Collaborative maintenance involves collaboration across the full lifecycle of a dataset.

Some examples might be helpful.

OpenStreetMap is a collaboratively produced spatial database of the entire world. While it was originally produced by individuals and communities, it is now contributed to by large organisations like Facebook, Microsoft and Apple. The Humanitarian OpenStreetMap community focuses on the collection and use of data to support humanitarian activities. The community are involved in deciding what data to collect, prioritising maintenance of data following disasters, and mapping activities either on the ground or remotely. The community works across the lifecycle and is self-directing.

Common Voice is a Mozilla project. It aims to build an open dataset to support voice recognition applications. By asking others to contribute to the dataset, they hope to make it more comprehensive and inclusive. Mozilla have defined what data will be collected and the tasks to be carried out, but anyone can contribute to the dataset by adding their voice or transcribing a recording. It’s this open participation that could help ensure that the dataset represents a more diverse set of people.

Edubase is maintained by the Department for Education (DfE). It’s our national database of schools. It’s used in a variety of different applications. Like Mozilla, DfE are acting as the steward of the data and have defined what information should be collected. But the work of populating and maintaining the shared directory is carried out by people in the individual schools. This is the best way to keep that data up to date. Those who are know when the data has changed have the ability to update it. The contributors all benefit from shared resource.

Build a shared directory is a common use for collaborative maintenance. But there are others.

Looking across these projects and other examples that we’ve studied in our desk and user research, we can see that there are different ways we can collaborate around data.

For example, we can work together to decide what data to collect. We can share the work of collecting and maintaining data, ensuring its quality and governing access to it. We can use open source to help to build the tools to support those communities.

We’ve developed the collaborative maintenance guidebook to help support the design of new services and platforms. It includes some background and a worked example. The bulk of the guidebook is a set of “design patterns” that describe solutions to common problems. For example how to manage quality when many different people are contributing to the same dataset.

We think collaborative maintenance can be useful in more projects. For civil society organisations collaborative maintenance might help you engage with communities that you’re supporting to collect and maintain useful data. It might also be a tool to support collaboration across the sector as a means of building common resources.

The guidebook is at an early stage and we’d love to get feedback on it contents. Or help you apply it to a real-world project. Let us know what you think!

 

A key difference between open data and open source

In “left-pad and the data commons” I tried to identify some lessons for the open data community based on recent events in the Javascript/NPM world. Open source, open science and open data are all parts of the same endeavor of creating the commons. There’s a lot of fertile territory to be explored by looking at how those respective communities are operating, the infrastructure they’re building, and the kinds of issues that are being faced.

One thing that occurs to me is that there’s currently some important differences between how open source and open data projects operate.

The similarities are obvious. Compare the key principles of the open source definition and the open definition, for example. Both have basic ideas such as the ability to access the entirety of the source code or data (let’s call them “works”). The ability to create derived works; the right to distribute the works and derivatives; ability to use the works for commercial and non-commercial uses, etc.

The ability to create derived works means that anyone can also modify the source or data as they see fit. In practice this means forking: creating a new custom version of some software, or a modified (corrected, reformatted) version of a dataset.

The differences are in the infrastructure that supports the original works. The default practice in the open source world is that code will be:

  • published in a public repository
  • published with a complete version history (or at least versioning dating from its publication)
  • published in an environment that supports transparent reporting of issues, bugs and suggestions
  • published in an environment that includes good documentation tools, such as a wiki
  • and, most importantly, published in an environment that allows forks and improvements to be folded back into the original project

I’d go as far as suggesting that each of these are as important to our modern experience and expectations of open source, as the basic rights granted by open licences. Clearly, not all open source projects benefit from a community of contributors, but the infrastructure is there to enable it. I see moves in the open source community to make contributions easier and more welcome.

This isn’t the case with the majority of open data releases though. The current practice is that:

  • data is published by a single organisation
  • there is little insight into how the data was curated, at best there is some documentation
  • data portals provide some infrastructure for, e.g. issue reporting and documentation, but this is often limited in scope
  • data portals don’t provide any support for encouraging collaboration or external contributions

There are, of course, examples of open datasets that are created from collaborative models. This includes Open Street Map, legislation.gov.uk and others. But these are currently the exceptions, rather than the norm. I’ve previously wondered whether we need more of these types of institution and incubators to support them.

Open source really came into the mainstream when commercial organisations started to adopt it not just as a way of releasing a work they had produced, but also embraced its collaborative aspects. Entire industries have now built up around open source projects that are see organisations that compete in other areas collaborating on the common, core infrastructure.

While we should continue to urge commercial organisations to open up their existing assets, I think that the open data commons will really start to mature once we starting adopting collaborative models. Which means the open data community needs to think about the tooling we need to enable that.

A “github for data” might be a useful short-hand. But this would overlook the fact that modern open source development is now done in an ecosystem that consists of an extremely rich infrastructure: continuous integration tools, discovery tools, package managers, repositories, etc. Github is the platform within which these tools co-ordinate. There will also be challenges that are specific to open data, such as anonymisation, aggregation, registries, identifiers and more.