This weekend I started a side project which I plan to spend some time on this winter. The goal is to create a web interface that will let people explore geospatial datasets published by the three local authorities that make up the West of England Combined Authority: Bristol City Council, South Gloucestershire Council and Bath & North East Somerset Council.
Through Bath: Hacked we’ve already worked with the council to publish a lot of geospatial data. We’ve also run community mapping events and created online tools to explore geospatial datasets. But we don’t have a single web interface that makes it easy for anyone to explore that data and perhaps mix it with new data that they have collected.
Rather than build something new, which would be fun but time consuming, I’ve decided to try out TerriaJS. Its an open source, web based mapping tool that is already being used to publish the Australian National Map. It should handle doing the West of England quite comfortably. It’s got a great set of features and can connect to existing data catalogues and endpoints. It seems to be perfect for my needs.
I decided to start by configuring the datasets that are already in the Bath: Hacked Datastore, the Bristol Open Data portal, and data.gov.uk. Every council also has to publish some data via standard APIs as part of the INSPIRE regulations, so I hoped to be able to quickly bring a list of existing datasets without having to download and manage them myself.
Unfortunately this hasn’t proved as easy as I’d hoped. Based on what we’ve learned so far about the state of geospatial data infrastructure in our project at the ODI I had reasonably low expectations. But there’s nothing like some practical experience to really drive things home.
Here’s a few of the challenges and issues I’ve encountered so far.
- The three councils are publishing different sets of data. Why is that?
- The dataset licensing isn’t open and looks to be inconsistent across the three councils. When is something covered by INSPIRE rather than the PSMA end user agreement?
- The new data.gov.uk “filter by publisher” option doesn’t return all datasets for the specified publisher. I’ve reported this as a bug, in the meantime I’ve fallen back on searching by name
- The metadata for the datasets is pretty poor, and there is little supporting documentation. I’m not sure what some of the datasets are intended to represent. What are “core strategy areas“?
- The INSPIRE service endpoints do include metadata that isn’t exposed via data.gov.uk. For example this South Gloucester dataset includes contact details, data on geospatial extents, and format information which isn’t otherwise available. It would be nice to be able to see this and not have to read the XML
- None of the metadata appears to tell me when the dataset was last updated. The last modified data on data.gov.uk is (I think) the date the catalogue entry was last updated. Are the Section 106 agreements listed in this dataset from 2010 or are they regularly updated. How can I tell?
- Bath is using GetMapping to host its INSPIRE datasets. Working through them on data.gov.uk I found that 46 out of the 48 datasets I reviewed have broken endpoints. I’m reasonably certain these used to work. I’ve reported the issue to the council.
- The two datasets that do work in Bath cannot be used in TerriaJS. I managed to work around the fact that they require a username and password to access but have hit a wall because the GetMapping APIs only seem to support EPSG:27700 (British National Grid) and not EPSG:3857 as used by online mapping tools. So the APIs refuse to serve the data in a way that can be used by the framework. The Bristol and South Gloucestershire endpoints handle this fine. I assume this is either a limitation of the GetMapping service or a misconfiguration. I’ve asked for help.
- A single Web Mapping Service can expose multiple datasets as individual layers. But apart from Bristol, both Bath and South Gloucestershire are publishing each dataset through its own API endpoint. I hope the services they’re using aren’t charging per end-point, as they’re probably unnecessary? Bristol has chosen to publish a couple of API that bring together several datasets, but these are also available individually through separate APIs.
- The same datasets are repeated across data catalogues and endpoints. Bristol has its data listed as individual datasets in its own platform, listed as individual datasets in data.gov.uk and also exposed via two different collections which bundle some (or all?) of them together. I’m unclear on the overlap or whether there are differences between them in terms of scope, timeliness, etc. The licensing is also different. Exploring the three different datasets that describe allotments in Bristol, only one actually displayed any data in TerriaJS. I don’t know why
- The South Gloucestershire web mapping services all worked seamlessly, but I noticed that if I wanted to download the data, then I would need to jump through hoops to register to access it. Obviously not ideal if I do want to work with the data locally. This isn’t required by the other councils. I assume this is a feature of MisoPortal
- The South Gloucestershire datasets don’t seem to include any useful attributes for the features represented in the data. When you click on the points, lines and polygons in TerriaJS no additional information is displayed. I don’t know yet whether this data just isn’t included in the dataset or if its a bug in the API or in how TerriaJS is requesting it. I’d need to download or explore the data in some other way to find out. However the data that is available from Bath and Bristol also has inconsistencies in how its described, so I suspect there aren’t any agreed standards
- While TerriaJS doesn’t have a plugin for OpenDataSoft (which powers the Bristol Open Data platform), I found that OpenDataSoft do provide a Web Feature Service interface. So I was able to configure that in TerriaJS to access that. Unfortunately I then found that either there’s a bug in the platform or a problem with the data because most of the points were in the Indian Ocean
The goal of the INSPIRE legislation was to provide a common geospatial data infrastructure across Europe. What I’m trying to do here should be relatively quick and easy to do. Looking at this graph of INSPIRE conformance for the UK, everything looks rosy.
But, based on an admittedly small sample of only three local authorities, the reality seems to be that:
- services are inconsistently implemented and have not been designed to be used as part of native web applications and mapping frameworks
- metadata quality is poor
- there is inconsistent detail about features which makes it hard to aggregate, use and compare data across different areas
- it’s hard to tell the provenance of data because of duplicated copies of data across catalogues and endpoints. Without modification or provenance information, its unclear whether data is data is up to date
- licensing is unclear
- links to service endpoints are broken. At best, this leads to wasted time from data users. At worst, there’s public money being spent on publishing services that no-one can access
It’s important that we find ways to resolve these problems. As this recent survey by the ODI highlights, SMEs, startups and local community groups all need to be able to use this data. Local government needs more support to help strengthen our geospatial data infrastructure.