I’m trying to understand when and how UPRNs can be a part of open data, whether published by councils or others organisations. I’m writing down what I understand in the hope that others might find this useful or might be able to correct any misunderstandings by leaving a comment. It’d be great to get some official confirmation from Ordnance Survey and others too.
On the 16th February 2015 there was an announcement from Ordnance Survey that said:
Supporting the local government transparency and government open data agendas, Ordnance Survey, GeoPlace and the Improvement Service are enabling AddressBase internal business use customers to release Unique Property Reference Numbers (UPRNs) on a royalty free and open basis. The move will facilitate the release and sharing of public and private sector addressing databases.
The announcement notes that this brings UPRNs into line with the terms that apply to reuse of the TOID identifier.
The relevant background documents are the following. Use these as your primary guidance:
- Policy Statement – Unique Property Reference Number – the primary policy document
- Presumption to Publish – an amended data release processes for the public sector
- AddressBase UPRN Use – more detailed guidance on interpreting the policy statement
- Policy Statement – OS Mastermap Topographic Identifiers – for the purposes of comparison
The first policy statement is the significant document. It’s been updated to clarify some elements and has some example permitted and non-permitted uses which we’ll explore below.
What follows is my understanding of those documents and consideration of some additional scenarios of how and when UPRNs can be released as open data.
If any of the below is wrong, please leave a comment!
A log of updates and clarifications to this post:
- 3/9/2015 – Added notes and comments about the ONS National Statistics Address Lookup dataset
Who can publish UPRNs in open data?
Current AddressBase licensees (specifically “Internal Business Use customers”) can publish data containing UPRNs without the need to place any restrictions on their downstream use. This only applies to the UPRN identifiers themselves as there are some provisos around what data can be published.
Anyone who obtains a UPRN from an open dataset can also use those identifiers to publish additional open data, e.g. by annotating a dataset with additional information, so long as they obey any licensing requirements from their source datasets.
This is perhaps best understood as distribution rather than publication though because the UPRNs must have previously been available in, and obtained from, an open dataset.
If you’re not an AddressBase customer, then you can’t publish new datasets containing previously unpublished UPRNs.
Note: the policy document refers only to AddressBase licensees. It doesn’t state a specific AddressBase product that must be licensed (there are several). It also doesn’t really define customers beyond that although the public sector presumption to publish does.
Who can use UPRNs in open data?
If you obtain a UPRN from an open dataset, then you can use it without incurring any additional licensing restrictions beyond what is stated in the licence for that dataset.
So, if a local authority publishes some data containing UPRNs under the OGL, then you can use it for commercial and non-commercial purposes so long as you attribute your sources.
What licence can be used when publishing UPRNs in open data?
Any open licence can be used to publish the UPRNs identifiers.
However there may be additional licensing restrictions that must be applied to the dataset if:
- the dataset contains additional OS data
- the dataset was constructed by using or referencing the geographical co-ordinates of the UPRNs
The first restriction seems obvious: if you include non-open data then this will impact your licensing options. The second is less clear: depending on how you constructed the dataset, you may not be able to publish it openly.
The specific wording is that UPRNs can only be published on a royalty-free basis, and with the option to sub-licence if:
licensees have not extracted UPRNs by using or making reference to the coordinates within AddressBase products data
Lets refer to that restriction as the “spatial reference restriction“. Examples are essential to help clarify where and when it applies.
Additional public sector permissions
It’s also worth highlighting that the presumption to publish document notes that for public sector customers of AddressBase the OS
…will permit the release of the OS x,y co-ordinates for your public sector assets, together with the UPRN, such that members are able to release datasets required to meet the requirements of the Local Government Transparency Code.
This means that as long as the presumption to publish process is followed, that it should be possible for local authorities to publish both UPRNs and their co-ordinates in derived datasets that aren’t substantial extracts of the source data.
However this is expanded on in the OS licensing guidance which emphases that the permission applies to public sector assets only, and specifically those datasets within the Local Government Transparency Code. There’s also a note that:
This is subject to the member having permission from Royal Mail in relation to the release of any data derived from PAF
Which piles caveats upon caveats.
It’s not immediately clear to me if the spatial reference restriction is also intended to apply here or whether this is separate special dispensation for public sector customers. I’m assuming the latter, but it would be useful to have some confirmation.
Lets work through a couple of examples of where UPRNs might be published as part of an open data release. The UPRN policy statement includes several examples which we’ll build on here.
Companies House address matching
This is the first permitted example in the policy statement:
A third party takes an open address dataset, such as the Free Company Data Product from Companies House, and matches the data contained within against one of the AddressBase products using non-spatial methods. It then appends the UPRN from the AddressBase products to this address data.
Emphasis is mine.
In this example a local authority could take the list of registered companies in its local area, match the addresses against AddressBase and publish a local extract that has been annotated with the UPRN. This could be published under the OGL (we’ll ignore the unclear licensing of Companies House data for now!).
The matching of addresses has to be done using non-spatial methods which means using text matching of the address components.
Food hygiene rating location matching
The FSA food hygiene rating open data includes the addresses and X,Y co-ordinates of places that have been assigned a food hygiene rating. Could a local authority do the same thing to this dataset as in the previous example? E.g. publishing a local subset enriched with the UPRN?
The answer seems to be:
- No – if the data is matched based on comparing the X,Y co-ordinates in the hygiene data to AddressBase, e.g. to find the nearest property. The spatial reference restriction doesn’t allow this.
- Yes – if the data is matched using the address fields only.
The end result will be exactly the same dataset but only one approach seems to be valid as using the X,Y co-ordinates is a spatial method.
Unfortunately the OS terms don’t define what constitutes a spatial (or non-spatial) method. Using a distance calculation as suggested in this example seems like its definitely a spatial method. But its not clear for example, whether finding addresses within a location, e.g. a post code, or administrative area, counts as a spatial method.
In fact, given that AddressBase is essentially just a list of addresses and locations, its hard to think of examples other than just address matching where it would be possible to extract UPRNs.
Local authority land and building assets
The local government transparency code (p15-16) requires local authorities to publish a list of its land and building assets. This includes the UPRN and full address of all properties.
This is expressly allowed by the “presumption to publish” process, so the authority can do this without requiring additional permission. The authority could use a spatial query in AddressBase to find and extract all of the necessary data and publish it under the OGL.
Note: if AddressBase contained an indicator of whether a property was owned by the public sector, it wouldn’t be permissible for a non-public sector licensee to publish exactly the same dataset as above along with the co-ordinates. The spatial reference restriction would apply, so using a spatial query to extract the data would not be allowed.
Local government incentive scheme
The local government incentive scheme datasets includes planning applications, public toilets, and premises licences. Many of the local authorities in the UK are publishing these datasets against a standard schema. All of the schemas have been defined to include addresses, co-ordinates and UPRNs.
To meet the terms of the incentive scheme the datasets are published as open data. Currently the UPRNs are often not populated except for public toilets which have been given an exemption by the OS. This is mentioned in the schema guidance but I’ve not found a better link for it and its not listed here.
So, can a local authority update the planning and licensing datasets to include UPRNs? Yes, I think so.
Assuming that each planning and licensing application is matched to its UPRN via the address then everything should be fine. This is a “non-spatial” method and is essentially the same as the Companies House example.
However because these datasets are not part of the transparency code, I don’t think the local authority could include the X,Y co-ordinates of the UPRN without permission from the OS.
Bin and recycling collection routes
This is the example that triggered me looking into this issue again. I wanted to know: could we publish a list of UPRNs in Bath along with the identifier of the bin collection route they are on and which day of the week the bins are collected.
In order to tell someone when their bin or recycling will be collected you need to know what bin collection route they are on. And different sides of the street may be covered by different routes, so you can’t just publish a list of which roads are covered by which routes, you need to know which addresses it covers.
Unfortunately because you need a spatial query to do this, the spatial reference restriction applies. This means you can’t publish that dataset with UPRNs. I also don’t think you can publish it by substituting UPRNs for the textual addresses as that would amount to publishing a significant extract of PAF, basically all properties in the local area.
So this type of service data can’t be published as open data currently. Only local authorities can build services that know when and where recycling and bin collection services are available.
How do the UPRN terms compare with TOIDs?
Although the “OS OpenData™ TOID look-up service” mentioned in the terms, and originally available at http://opentoids.ordnancesurvey.co.uk/toidservice/ no longer exists, they can be found in the various OS open data products so its easily to look them up.
That’s not true for addresses and UPRNs.
Does the ONS NSAL dataset make UPRNs open data?
Commenting on the first version of this post, Owen Boswarva wondered if the ONS National Statistics Address Lookup (NSAL) meant that UPRNs are open data?
The NSAL dataset is described in this blog post. It’s a list of UPRNs mapped to various administrative regions. This allows for easy reporting and recasting of statistics by different geographies. The blog post explains that the changes to the UPRN policy were encouraged to help support the release of this dataset, which is published under the OGL. This means that there is already a complete list of UPRNs published under an open licence.
So does this means that the UPRNs are open data? Clearly the full list of UPRN identifiers is now available under an open licence from the ONS. So the answer could be a qualified yes. However as the ONS explain in their version notes (copy here), the dataset may be out of date with the authoritative copy in AddressBase, so isn’t necessarily definitive.
There’s also none of the accompanying metadata that I’d expect to see if the UPRN identifier scheme was fully published as open data, e.g. administrative metadata around when UPRNs are added or removed, relationships between UPRNs, and perhaps the address data.
While the NSAL dataset itself is excellent, helping to solve problems with mapping between the various local geographies, it doesn’t provide any additional utility beyond giving us a reasonably up to date count of how many UPRNs there are. It doesn’t help us publish more open data that include UPRNs, or help us annotate existing datasets with UPRNs, for that you still need the address and co-ordinate information held in AddressBase.
UPRNs are not open data, but they can be included in some open datasets. There are some very specific cases where UPRNs could usefully be added to both existing and new open data sets.
However there are some subtleties in understanding what is allowed that includes both who is publishing the data and how the dataset is be constructed.
Hopefully this post has shed some light onto the issues that might help open data publishers and, importantly, local authorities in understanding what can and can’t be done.
I’ll update this post to make corrections as and when necessary. Please leave a comment if you have an issue with any of my reasoning. Also, please comment if you have additional examples of permitted or non-permitted publication.