The Land Registry have today announced the publication of their Open Data — including both Price Paid information and Transactions as Linked Data. This is great to see, as it means that there is another UK public body making a commitment to Linked Data publishing.
I’ve taken some time to begin exploring the data. This blog post provides some pointers that may help others in using the Linked Data. I’m also including some hopefully constructive feedback on the approach that the Land Registry have taken.
The Land Registry Linked Data
The Linked Data is available from http://landregistry.data.gov.uk this follows the general pattern used by other organisations publishing public sector Linked Data in the UK.
The data consists of a single SPARQL endpoint — based on the Open Source Fuseki server — which contains RDF versions of both the Price Paid and Transaction data. The documentation notes that the endpoint will be updated on the 20th of each month, with the equivalent to the monthly releases that are already published as CSV files.
Based on some quick tests, it would appear that the endpoint contains all of the currently published Open Data, which in total is 16,873,170 triples covering 663,979 transactions.
The data seems to primarily use custom vocabularies for describing the data:
- Land Registry Common — which includes definitions of address types, including a BS7666 Address class, and its related properties, e.g. locality, postcode, etc
- Land Registry Price Paid Information — which defines the notion of Transaction and its relationships
- Land Registry Transaction Figures — which appears to be mainly terms for capturing counts of different types of transaction, e.g. Application Counts.
The landing page for the data doesn’t include any examples, but I ran some SPARQL queries to extract a few, e.g:
- A Transaction
- A Transaction Record describing a deletion from the registry (presumably to fix the date?!)
- A BS7666 Address associated with a transaction record.
- An Applications By Account entity
So for Price Paid Data, the model appears to be that a Transaction has a Transaction Record which in turn has an associated Address. The transaction counts seem to be standalone resources.
The SPARQL endpoint for the data is at http://landregistry.data.gov.uk/landregistry/sparql. A test form is also available and that page has a couple of example queries, including getting Price Paid data based on a postcode search.
However I’d suggest that the following version might be slightly better as it includes the record status for the record, which will indicate whether it is an “add” or a “delete”:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX lrppi: <http://landregistry.data.gov.uk/def/ppi/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX lrcommon: <http://landregistry.data.gov.uk/def/common/> SELECT ?paon ?saon ?street ?town ?county ?postcode ?amount ?date ?status WHERE { ?transx lrppi:pricePaid ?amount . ?transx lrppi:transactionDate ?date . ?transx lrppi:propertyAddress ?addr. ?transx lrppi:recordStatus ?status. ?addr lrcommon:postcode "PL6 8RU"^^xsd:string . ?addr lrcommon:postcode ?postcode . OPTIONAL {?addr lrcommon:county ?county .} OPTIONAL {?addr lrcommon:paon ?paon .} OPTIONAL {?addr lrcommon:saon ?saon .} OPTIONAL {?addr lrcommon:street ?street .} OPTIONAL {?addr lrcommon:town ?town .} } ORDER BY ?amount
General Feedback
Lets start with the good points:
- The data is clearly licensed so is open for widespread re-use
- There is a clear commitment to regularly updating the data, so it should stay in line with the Land Registry’s other Open Data. This makes it reliable for developers to use the data and the identifiers it contains
- The data uses Patterned URIs based on Shared Keys (the Land Registry’s own transaction identifiers) so building links is relatively straight-forward
- The vocabularies are documented and the URIs resolve, so it is possible to lookup the definitions of terms. I’m already finding that easier than digging through the FAQs that the Land Registry publish for the CSV versions.
However I think there is room for improvement in a number of areas:
- It would be useful to have more example queries, e.g. how to find the transactional data, as well as example Linked Data resources. A key benefit of a linked dataset is that you should be able to explore it in your browser. I had to run SPARQL queries to find simple examples
- The SPARQL form could be improved: currently it uses a POST by default and so I don’t get a shareable URL for my query; the Javascript in the page also wipes out my query every time I hit the back button, making it frustrating to use
- The vocabularies could be better documented, for example a diagram showing the key relationships would be useful, as would a landing page providing more of a conceptual overview
- The URIs in the data don’t match the patterns recommended in Designing URI Sets for the Public Sector. While I believe that guidance is under review, the data is diverging from current documented best practice. Linked Data purists may also lament the lack of a distinction between resource and page.
- The data uses custom vocabulary where there are existing vocabularies that fit the bill. The transactional statistics could have been adequately described by the Data Cube vocabulary with custom terms for the dimensions. The related organisations could have been described by the ORG ontology and VCard with extensions ought to have covered the address information.
But I think the biggest oversight is the lack of linking, both internal and external. The data uses “strings” where it could have used “things”: for places, customers, localities, post codes, addresses, etc.
Improving the internal linking will make the dataset richer, e.g. allowing navigation to all transactions relating to a specific address, or all transactions for a specific town or postcode region. I’ve struggled to get a Post Code District based query to work (e.g. “price paid information for BA1”) because the query has to resort to regular expressions which are often poorly optimised in triple stores. Matching based on URIs is always much faster and more reliable.
External linking could have been improved in two ways:
- The dates in the transactions could have been linked to the UK Government Interval Sets. This provides URIs for individual days
- The postcode, locality, district and other regional information could have been linked to the Ordnance Survey Linked Data. That dataset already has URIs for all of these resources. While it may have been a little more work to match regions, the postcode based URIs are predictable so are trivial to generate.
These improvements would have moved from Land Registry data from 4 to 5 Stars with little additional effort. That does more than tick boxes, it makes the entire dataset easier to consume, query and remix with others.
Hopefully this feedback is useful for others looking to consume the data or who might be undertaking similar efforts. I’m also hoping that it is useful to the Land Registry as they evolve their Linked Data offering. I’m sure that what we’re seeing so far is just the initial steps.
Could you detail the SPARQL query used to produce the applications by account?
Hello,
Is there any way to retrieve all the PL6 transactions for instance? I tried filter regex(?postcode, “^PL6”, “i”) but it didn’t work. Thank you in advance for any help.