Building the new Ordnance Survey Linked Data platform

Disclaimer: the following is my own perspective on the build & design of the Ordnance Survey Linked Data platform. I don’t presume to speak for the OS and don’t have any inside knowledge of their long term plans.

Having said that I wanted to share some of the goals we (Julian Higman, Benjamin Nowack and myself) had when approaching the design of the platform. I will say that we had the full support and encouragement of the Ordnance Survey throughout the project, especially John Goodwin and others in the product management team.

Background & Goals

The original Ordnance Survey Linked Data site launched in April 2010. At the time it was a leading example of adoption of Linked Data by a public sector organisation. But time moves on and both the site and the data were due for a refresh. With Talis’ withdrawal from the data hosting business, the OS decided to bring the data hosting in-house and contracted Julian, Benjamin and myself to carry out the work.

While the migration from Talis was a key driver, the overall goal was to deliver a new Linked Data platform that would make a great showcase for the Ordnance Survey Linked Data. The beta of the new site was launched in April and went properly live at the beginning of June.

We had a number of high-level goals that we set out to achieve in the project:

  • Provide value for everyone, not just developers — the original site was very developer-centric, offering a very limited user experience with no easy way to browse the data. We wanted everyone to begin sharing links to the Ordnance Survey pages and that meant that the site needed a clean, user-friendly design. This meant we approached it from the point of building an application, not just a data portal
  • Deliver more than Linked Data — we wanted to offer a set of APIs that made the data accessible and useful for people who weren’t familiar with Linked Data or SPARQL. This meant offering some simpler tools to enable people to search and link to the data
  • Deliver a good developer user experience –this meant integrating API explorers, plenty of examples, and clear documentation. We wanted to shorten the “time to first JSON” to get developers into the data as fast as possible
  • Showcase the OS services and products — the OS offer a number of other web services and location products. The data should provide a way to show that value. Integrating mapping tools was the obvious first step
  • Support latest standards and best practices — where possible we wanted to make sure that the site offered standard APIs and formats, and conformed to the latest best practices around open data publishing
  • Support multiple datasets — the platform has been designed to support multiple datasets, allowing users to use just the data they need or the whole combined dataset. This provides more options for both publishing and consuming the data
  • Build a solid platform to support further innovation — we wanted to leave the OS with an extensible, scalable platform to allow them to further experiment with Linked Data

Best Practices & Standards

From a technical perspective we need to refresh not just the data but the APIs used to access it. This meant replacing the SPARQL 1.0 endpoint and custom search interface offered in the original with more standard APIs.

We also wanted to make the data and APIs discoverable and adopted a “completionist” approach to try and tick all the boxes for publishing and exposing dataset metadata, including basic versioning and licensing information.

As a result we ended up with:

  • SPARQL 1.1 query endpoints for every dataset, which expose a basic SPARQL 1.1 Service Description as well as the newer CSV and TSV response formats
  • Well populated VoID descriptions for each dataset, including all of the key metadata items including publication dates, licensing, coverage, and some initial dataset statistics
  • Autodiscovery support for datasets, APIs, and for underlying data about individual Linked Data resources
  • OpenSearch 1.1 compliant search APIs that support keyword and geo search over the data. The Atom and RSS response formats include the relevance and geo extensions
  • Licensing metadata is clearly labelled not just on the datasets, but as a Link HTTP header in every Linked Data or API result, so you can probe resources to learn more
  • Basic support for the OpenRefine Reconciliation API as a means to offer a simple linking API that can be used in a variety of applications but also, importantly, with people curating and publishing small datasets using OpenRefine
  • Support for CORS, allowing cross-browser requests to be made to the Linked Data and all of the APIs
  • Caching support through the use of ETags and Last-Modified headers. If you’re using the APIs then you can optimise your requests and cache data by making Conditional GET requests
  • Linked Data pages that offer more than just a data dump, the integrated mapping and links to other products and services makes the data more engaging.
  • Custom ontology pages that allow you to explore terms and classes within individual ontologies, e.g. see for example the definition of “London Borough

Clearly there’s more that could be potentially done. Tools can always be improved, but the best way for that to happen is through user feedback. I’d love to know what you think of the platform.

Overall I think we’ve achieved our goal of making a site that, while clearly developer oriented, offers a good user experience for non-developers. I’ll be interested to see what people do with the data over the coming months