Author Archives: Leigh Dodds

Data and information in the city

For a while now I’ve been in the habit of looking for data as I travel to work or around Bath. You can’t really work with data and information systems for any length of time without becoming a little bit obsessive about numbers or becoming tuned into interesting little dashboards:

My eye gets drawn to gauges and displays on devices as I’m curious about not just what they’re showing but also for whom the information is intended.

I can also tell you that for at least ten years, perhaps longer, the electronic signs on some of the buses running the Number 10 route in Bath have been buggy. Instead of displaying “10 Southdown” they read “(ode1fsOs1ss1 10sit2 Southdown)” with a flashing “s” in “sit”.

Yes. I wrote it down. I was curious about whether it was some misplaced control codes, but I couldn’t find a reference.

Having spent so long working on data integration and with technologies like Linked Data, I’m also curious about how people assign identifiers to things. A lot of what I’ve learnt about that went into writing this paper, which is a piece of work of which I’m very proud. It’s become an ingrained habit to look out for identifiers wherever I can find them.  It’s not escaped me that this is pretty close to train spotting, btw!

I’ve also recently started contributing to Bath: Hacked, which is Bath’s community-led open data project. Its led me to pay even closer attention to the information around me in Bath, as it might turn up some useful data that could be published or indicate the potential for a useful digital service.

So to try and direct my “data magpie” habits into a more productive direction, I’ve started on a small project to photograph some of the information and data I find as I walk around the city. There are signs, information and data all around us but we don’t often really notice it or we just take the information for granted. I decided to try to catalogue some of the ways in which we might encounter data around Bath and, by extension, in other cities.

The entire set of photos is available on Flickr if you care to take a look. Think of it as a natural history study of data.

In the rest of this post I wanted to explore a few things that have occurred to me along the way. Areas where we can glimpse the digital environment and data infrastructure that is increasingly supporting the physical environment. And the ways in which data might be intentionally or incidentally shared with others.

Data as dark matter

For most people data is ephemeral stuff. It’s not something they tend to think about even though its being collected and recorded all around us. While there’s increasing awareness of how our personal data is collected and used by social networks and other services, there’s often little understanding of what data might be available about the built environment.

But you can see evidence of that data all around us. Data is a bit like dark matter: we often only know it exists based on its effects on other things which we more clearly understand. Once you start looking you can see identifiers everywhere:

Bridge identifiers

If something has an identifier then there will be data associated with it, creating a record that describes that object. As there is very likely to be a collection of those things then we can infer that there’s a database containing many similar records.

Once you start looking you can see databases everywhere: of lampposts, parking spaces, bins, and the monoliths that sit in our streets but which we rarely think about:

Traffic light control box

Once you realise all of these databases exist it’s natural to start asking questions such as how that information is collected, who is responsible for it, and when might it be useful?  There are databases everywhere and people are employed to look after them.

The bus driver’s role in data governance

Live bus times

I was looking forward to the installation of the Real Time Information sign at the bus stop (0180BAC30294) near my house. For a few years now I’ve been regularly taking a photo of the paper sign on the stop. Looking at that on my phone is still much quicker than using any of the online services or apps. A real time data feed was going to solve that. Only it didn’t. It’s made things worse:

My morning bus, the one that begins my commute to the Open Data Institute, is often not listed. I’ve had several morning conversations with Travelwest about it. Although, evoking Hello Lamppost, it feels like I’ve been arguing with the bus sign itself and would like to leave a note to others to say that, actually yes the Number 10 is really on its way.

I’m suddenly concerned that they may do away with that helpful paper sign. The real-time information feed exposes problems with data management that wouldn’t otherwise be evident. Real-time doesn’t necessarily always mean better.

Interestingly Travelwest have an FAQ that lists a number of reasons why some buses won’t appear on the RTI system. This includes the expected range of data and hardware problems, but also: “The bus driver has logged on to the ETM incorrectly, preventing the journey operated being ‘matched’ by the central RTI system“.

So it turns out that bus drivers have a key role in the data governance of this particular dataset. They’re not just responsible for getting the bus from A-B but also in ensuring that passengers know that its on its way. I wonder if that’s part of their induction?

The paperless city

There are more obvious signs of business processes that we can see around a city. These are stages in processes that require some public notice or engagement, such as planning applications or other “rights to object” to planned works:

Pole Objection Notice

In other cases the information is presented as an indication that a process has been completed successfully, such as gaining a premises licence, liability insurance or an energy rating certificate. If this information is being put on physical display then it’s natural to wonder whether there are digital versions that could or should be made available.

Also, in the majority of cases, making this information availability digitally would probably be much better. There are certainly opportunities to create better digital services to help engage people in these processes. But in order to be inclusive I suspect paper based approaches are going to be around for a while.

What would a digital public service look like that provided this type of city information, both on-demand and as notifications, to residents? The information might already be available on council websites, but you have to know that it’s there and then how find it.

Visible to the public, but not for the public

Interestingly, not all of the information we can find around the city is intended for wider public consumption. It may be published into a public space but it might only be intended for a particular group of people, or useful at a particular point in time, e.g. during an emergency such as this map of fire sensors.

Fire hydrant

Most of the identifier examples I referred to above fall into this category. Only a small number of people need to know the identifier for a specific bin, traffic light control both, or bridge.

It also means that information may often be provided without context as the intended audience knows how to read it or has the tools required to use it to unlock more information.  This means to properly interpret it you have to be able to understand the visual code that is used in these organisational hobo signs.

The importance of notice boards

For me there’s something powerful in the juxtaposition of these two examples:

Community notice board

Dynamic display board

The first is a community notice board. Anyone can come along and not only read it but also add to the available information. It’s a piece of community owned and operated information infrastructure. This manually updated map of the local farmers market is another nice example, as are the walls of flyers and event notices at the local library.

The second example is a sealed unit. It’s owned and operated by a single organisation who gets to choose what information is displayed. Community annotations aren’t possible. There’s no scope to add notices or grafitti to appropriate the structure for other purposes – something that you see everywhere else in the city. This is increasingly hard to do with digitial infrastructures.

In my opinion a truly open city will include both types of digital and physical infrastructure. I dislike the top-down view of the smart city and preview the vision of creating an open, annotable data infrastructure for residents and local businesses to share information.

Useful perspective

In this rambling post I’ve tried to capture some of the thoughts that have occurred to me whilst taking a more critical look at how data and information is published in our cities. I’ve really only scratched the surface, but it’s been fun to take a step back and look at Bath with a slightly more critical eye.

I think it’s interesting to see how data leaks into the physical environment, either intentionally or otherwise. Using environments that people are familiar with might also be a useful way to get a wider audience thinking about the data that helps our society function, and how it is owned and operated.

It’s also interesting to consider how a world of increasingly connected devices and real-time information is going to impact this environment. Will all of this information move onto our phones, watches or glasses and out of the physical infrastructure? Or are we going to end up with lots more cryptic icons and identifiers on all kinds of bits of infrastructure?


“The scribe and the djinn’s agreement”, an open data parable

In a time long past, in a land far away, there was once a great city. It was the greatest city in the land and the vast marketplace at its centre was the busiest, liveliest marketplace in the world. People of all nations could be found there buying and selling their wares. Indeed, the marketplace was so large that people would spend days, even weeks, exploring its length and breadth would still discover new stalls selling a myriad of items.

A frequent visitor to the marketplace was a woman known only as the Scribe. While the Scribe was often found roaming the marketplace even she did not know of all of the merchants to be found within its confines. Yet she spent many a day helping others to find their way to the stalls they were seeking, and was happy to do so.

One day, as a gift for providing useful guidance, a mysterious stranger gave the Scribe a gift: a small magical lamp. Upon rubbing the lamp a djinn appeared before the suprised Scribe and offered her a single wish.

“Oh venerable djinn” cried the Scribe, “grant me the power to help anyone that comes to this marketplace. I wish to help anyone who needs it to find their way to whatever they desire”.

With a sneer the djinn replied: “I will grant your wish. But know this: your new found power shall come with limits. For I am a capricious spirit who resents his confinement in this lamp”. And with a flash and a roll of thunder, the magic was completed. And in the hands of the Scribe appeared the Book.

The Book contained the name and location of every merchant in the marketplace. From that day forward, by reading from the Book, the Scribe was able to help anyone who needed assistance to find whatever they needed.

After several weeks of wandering the market, happily helping those in need, the Scribe was alarmed to discover that she was confronted by a long, long line of people.

“What is happening?” she asked of the person at the head of the queue.

“It is now widely known that no-one should come to the Market without consulting the Scribe” said the man, bowing. “Could you direct me to the nearest merchant selling the finest silks and tapestries?”

And from that point forward the Scribe was faced with a never-ending stream of people asking for help. Tired and worn and no longer able to enjoy wandering the marketplace as had been her whim, she was now confined to its gates. Directing all who entered, night and day.

After some time, a young man took pity on the Scribe, pushing his way to the front of the queue. “Tell me where all of the spice merchants are to be found in the market, and then I shall share this with others!”

But no sooner had he said this than the djinn appeared in a puff of smoke: “NO! I forbid it!”. With a wave of its arm the Scribe was struck dumb until the young man departed. With a smirk the djinn disappeared.

Several days passed and a group of people arrived at the head of queue of petitioners.

“We too are scribes.” they said. “We come from a neighbouring town having heard of your plight. Our plan is to copy out your Book so that we might share your burden and help these people”.

But whilst a spark of hope was still flaring in the heart of the scribe, the djinn appeared once again. “NO! I forbid this too! Begone!” And with scream and a flash of light the scribes vanished. Looking smug the djinn disappeared.

Some time passes before a troupe of performers approach the Scribe. As a chorus they cried: “Look yonder at our stage, and the many people gathered before it. By taking turns from reading from the book, in front of wide audience, we can easily share your burden”.

But shaking her head the Scribe could only turn away whilst the djinn visited ruin upon the troupe. “No more” she whispered sadly.

And so, for many years the Scribe remained as she had been, imprisoned within the subtle trap of the djinn of the lamp. Until, one day a traveller appeared in the market. Upon reaching the head of the endless line of penitents, the man asked of the Scribe:

“Where should you go to rid your self of the evil djinn?”.

Surprised, and with sudden hope, the Scribe turned the pages of her Book…

Open data and diabetes

In December my daughter was diagnosed with Type 1 diabetes. It was a pretty rough time. Symptoms can start and escalate very quickly. Hyperglycaemia and ketoacidosis are no joke.

But luckily we have one of the best health services in the world. We’ve had amazing care, help and support. And, while we’re only 4 months into dealing with a life-long condition, we’re all doing well.

Diabetes sucks though.

I’m writing this post to reflect a little on the journey we’ve been on over the last few months from a professional rather than a personal perspective. Basically, the first weeks of becoming a diabetic or the parent of a diabetic, is a crash course in physiology, nutrition, and medical monitoring. You have to adapt to new routines for blood glucose monitoring, learn to give injections (and teach your child to do them), become good at book-keeping, plan for exercise, and remember to keep needles, lancets, monitors, emergency glucose and insulin with you at all times, whilst ensuring prescriptions are regularly filled.

Oh, and there’s a stupid amount of maths because you’ll need to start calculating how much carbohydrates are in all of your meals and inject accordingly. No meal unless you do your sums.

Good job we had that really great health service to support us (there’s data to prove it). And an amazing daughter who has taken it all in her stride.

Diabetics live a quantified life. Tightly regulating blood glucose levels means knowing exactly what you’re eating, and learning how your body reacts to different foods and levels of exercise. For example we’ve learnt the different ways that a regular school day versus school holidays effects my daughters metabolism. That we need to treat ahead for the hypoglycaemia that follows a few hours after some fun on the trampoline. And that certain foods (cereals, risotto) seem to affect insulin uptake.

So to manage the condition we need to know how many carbohydrates are in:

  • any pre-packaged food my daughter eats
  • any ingredients we use when cooking, so we can calculate a total portion size
  • in any snack or meal that we eat out

Food labeling is pretty good these days so the basic information is generally available. But its not always available on menus or in an easy to use format.

The book and app that diabetic teams recommend is called Carbs and Cals. I was a little horrified by it initially as its just a big picture book of different portion sizes of food. You’re encouraged to judge everything by eye or weight. It seemed imprecise to me but with hindsight its perfectly suited to those early stages of learning to live with diabetes. No hunting over packets to get the data you need: just look at a picture, a useful visualisation. Simple is best when you’re overwhelmed with so many other things.

Having tried calorie counting I wanted to try an app to more easily track foods and calculate recipes. My Fitness Pal, for example, is pretty easy to use and does bar-code scanning of many foods. There are others that are more directly targeted at diabetics.

The problem is that, as I’ve learnt from my calorie counting experiments, the data isn’t always accurate. Many apps fill their databases through crowd-sourcing. But recipes and portion sizes change continually. And people make mistakes when they enter data, or enter just the bits they’re interested in. Look-up any food on My Fitness Pal and you’ll find many duplicate entries. It makes me distrust the data because I’m concerned its not reliable. So for now we’re still reading packets.

Eating out is another adventure. There have been recent legislative changes to require restaurants to make more nutritional information available. If you search you may find information on a company website and can plan ahead. Sometimes its only available if you contact customer support. If you ask in a (chain) restaurant they may have it available in a ring-binder you can consult with the menu. This doesn’t make a great experience for anyone. Recently we’ve been told in a restaurant to just check online for the data (when we know it doesn’t exist), because they didn’t want to risk any liability by providing information directly. On another occasion we found that certain dishes – items from the childrens menu – weren’t included on the nutritional charts.

Basically, the information we want is:

  • often not available at all
  • available, but only if you know were to look or who to ask
  • potentially out of date, as it comes from non-authoritative sources
  • incomplete or inaccurate, even from the authoritative sources
  • not regularly updated
  • not in easy to use formats
  • available electronically, e.g. in an app, but without any clear provenance

The reality is that this type of nutritional and ingredient data is basically in the same state as government data was 6-7 years ago. It’s something that really needs to change.

Legislation can help encourage supermarkets and restaurants to make data available, but really its time for them to recognize that this is essential information for many people. All supermarkets, manufacturers and major chains will have this data already, there should be little effort required in making it public.

I’ve wondered whether this type of data ought to be considered as part of the UK National Information Infrastructure. It could be collected as part of the remit of the Food Standards Agency. Having a national source would help remove ambiguity around how data has been aggregated.

Whether you’re calorie or carb counting, open data can make an important difference. Its about giving people the information they need to live healthy lives.

Creating an Application Using the British National Bibliography

This is the fourth and final post in a series (1, 2, 3, 4) of providing background and tutorial material about the British National Bibliography. The tutorials were written as part of some freelance work I did for the British Library at the end of 2012. The material was used as input to creating the new documentation for their Linked Data platform but hasn’t been otherwise published. They are now published here with permission of the BL.

The British National Bibliography (BNB) is a bibliographic database that contains data on a wide range of books and serial publications published in the UK and Ireland since the 1950s. The database is available under a public domain license and can be accessed via an online API which supports the SPARQL query language.

This tutorial provides an example of building a simple web application using the BNB SPARQL endpoint using Ruby and various open source libraries. The tutorial includes:

  • a description of the application and its intended behaviour
  • a summary of the various open source components used to build the application
  • a description of how SPARQL is used to implement the application functionality

The example is written in a mixture of Ruby and Javascript. The code is well documented to support readers more familiar with other languages.

The “Find Me A Book!” Application

The Find Me a Book! demonstration application illustrates how to use the data in the BNB to recommend books to readers. The following design brief describes the intended behaviour.

The application will allow a user to provide an ISBN which is used to query the BNB in order find other books that the user might potentially want to read. The application will also confirm the book title to the user to ensure that it has found the right information.

Book recommendations will be made in two ways:

  1. More By The Author: will provide a list of 10 other books by the same author(s)
  2. More From Reading Lists: will attempt to suggest 10 books based on series or categories in the BNB data

The first use case is quite straight-forward and should generate some “safe” recommendations: it’s likely that the user will like other works by the author.

The second attempts will use the BNB data a little more creatively and so the suggestions are likely to be a little more varied.

Related books will be found by looking to see if the user’s book is in a series. If it is then the application will recommend other books from that series. If the book is not included in any series, then recommendations will be driven off the standard subject classifications. The idea is that series present ready made reading lists that are a good source of suggestions. By falling back to a broader categorisation, the user should always be presented with some recommendations.

To explore the recommended books further, the user will be provided with links to

The Application Code

The full source code of the application is available on The code has been placed into the Public Domain so can be freely reused or extended.

The application is written in Ruby and should run on Ruby 1.8.7 or higher. Several open source frameworks were used to build the application:

  • Sinatra — a light-weight Ruby web application framework
  • SPARQL Client — an client library for accessing SPARQL endpoints from Ruby
  • The JQuery javascript library for performing AJAX requests and HTML manipulation
  • The Boostrap CSS framework is used to build the basic page layout

The application code is very straight-forward and can be separated into server-side and client-side components.

Server Side

The server side implementation can be found in app.rb. The Ruby application delivers the application assets (CSS, images, etc) and also exposes several web services that act as proxies for the BNB dataset. These services submit SPARQL queries to the BNB SPARQL endpoint and then process the results to generate a custom JSON output.

The three services, which each accept an isbn parameter are:

Each of the services works in essentially the same way:

  • The isbn parameter is extracted from the request. If the parameter is not found then an error is returned to the client. The ISBN value is also normalised to remove any spaces or dashes
  • A SPARQL client object is created to provide a way to interact with the SPARQL endpoint
  • The ISBN parameter is injected into the SPARQL query that will be run against the BNB, using the add_parameters function
  • The final query is then submitted to the SPARQL endpoint and the results used to build the JSON response

The /related service may actually makes two calls to the endpoint. If the first query doesn’t return any results then a fallback query is used instead.


The client side Javascript code can all be found in find-me-a-book.js. It uses the JQuery library to trigger custom code to be executed when the user submits the search form with an ISBN.

The findTitle function calls the /title service to attempt to resolve the ISBN into the title of a book. This checks that the ISBN is in the BNB and provides useful feedback for the user.

If this initial call succeeds then the find function is called twice to submit parallel AJAX requests. One to the /by-author service, and one to the /related service. The function accepts two parameters, the first parameter identifies the service to call, the second provides a name that is used to guide the processing of the results.

The HTML markup uses a naming convention to allow the find function to write the results of the request into the correct parts of the page, depending on its second parameter.

The ISBN and title information found in the results from the AJAX requests are used to build links to the LibraryThing website. But these could also be processed in other ways, e.g. to provide multiple links or invoke other APIs.

Installing and Running the Application

A live instance of the application has been deployed to allow the code to be tested without having to install and run it locally. The application can be found at:

For readers interested in customising the application code, this section provides instructions on how to access the source code and run the application.

The instructions have been tested on Ubuntu. Follow the relevant documentation links for help with installation of the various dependencies on other systems.

Source Code

The application source code is available on Github and is organised into several directories:

  • public — static files including CSS, Javascript and Images. The main client-side Javascript code can be found in find-me-a-book.js
  • views — the templates used in the application
  • src — the application source code, which is contained in app.rb

The additional files in the project directory provide support for deploying the application and installing the dependencies.

Running the Application

To run the application locally, ensure that Ruby, RubyGems and
git are installed on the local machine.

To download all of the source code and asserts, clone the git repository:

git clone

This will create a bnb-example-app directory. To simplify the installation of further dependencies, the project uses the Bundler dependency management tool. This must be installed first:

sudo gem install bundle

Bundler can then be run to install the additional Ruby Gems required by the project:

cd bnb-example-app
sudo bundle install

Once complete the application can be run as follows:


The rackup application will then start the application as defined in By default the application will launch on port 9292 and should be accessible from:



This tutorial has introduced a simple demonstration application that illustrates one way of interacting with the BNB SPARQL endpoint. The application uses SPARQL queries to build a very simple book recommendation tool. The logic used to build the recommendations is deliberately simple to help illustrate the basic principles of working with the dataset and the API.

The source code for the application is available under a public domain license so can be customised or reused as necessary. A live instance provides a way to test the application against the real data.


Accessing the British National Bibliography Using SPARQL

This is the third in a series of posts (1, 2, 3, 4) providing background and tutorial material about the British National Bibliography. The tutorials were written as part of some freelance work I did for the British Library at the end of 2012. The material was used as input to creating the new documentation for their Linked Data platform but hasn’t been otherwise published. They are now published here with permission of the BL.

Note: while I’ve attempted to fix up these instructions to account with changes to the platform on which the data is published, there may still be some errors. If there are then please leave a comment or drop me an email and I’ll endeavour to fix.

The British National Bibliography (BNB) is a bibliographic database that contains data on a wide range of books and serial publications published in the UK and Ireland since the 1950s. The database is available under a public domain license and can be accessed via an online API.

The tutorial introduces developers to the BNB API which supports querying of the dataset via the SPARQL query language and protocol. The tutorial provides:

  • Pointers to relevant background material and tutorials on SPARQL and the SPARQL Protocol
  • A collection of useful queries and query patterns for working with the BNB dataset

The queries described in this tutorial have been published as a collection of files that can be download from github.

What is SPARQL?

SPARQL is a W3C standard which defines a query language for RDF databases. Roughly speaking SPARQL is the equivalent of SQL for graph databases. SPARQL 1.0 was first published as an official W3C Recommendation in 2008. At the time of writing SPARQL 1.1, which provides a number of new language features, will shortly be published as a final recommendation.

A SPARQL endpoint implements the SPARQL protocol allowing queries to be submitted over the web. Public SPARQL endpoints offer an API that allows application developers to query and extract data from web or mobile applications.

A complete SPARQL tutorial is outside the scope of this document, but there are a number of excellent resources available for developers wishing to learn more about the query language. Some recommended tutorials and reference guides include:

The BNB SPARQL Endpoint

The BNB public SPARQL endpoint is available from:

No authentication or API keys are required to use this API.

The BNB endpoint supports SPARQL 1.0 only. Queries can be submitted to the endpoint using either GET or POST requests. For POST requests the query is submitted as the body of the request, while for GET requests the query is URL encoded and provided in the query parameter, e.g:

Refer to the SPARQL protocol specification for additional background on submitting queries. Client libraries for interacting with SPARQL endpoints are available in a variety of languages, including python, ruby, nodejs, PHP and Java.

Types of SPARQL Query and Result Formats

There are four different types of SPARQL query. Each of the different types supports a different use case:

  • ASK: returns a true or false response to test whether data is present in a dataset, e.g. to perform assertions or check for interesting data before submitting queries. Note these no longer seem to be supported by the BL SPARQL endpoint. All ASK queries now return an error.
  • SELECT: like the SQL SELECT statement this type of query returns a simple tabular result set. Useful for extracting values for processing in non-RDF systems
  • DESCRIBE: requests that the SPARQL endpoint provides a default description of the queried results in the form of an RDF graph
  • CONSTRUCT: builds a custom RDF graph based on data in the dataset

Query results can typically be serialized into multiple formats. ASK and SELECT queries have standard XML and JSON result formats. The graphs produced by DESCRIBE and CONSTRUCT queries can be serialized into any RDF format including Turtle and RDF/XML. The BNB endpoint also supports RDF/JSON output from these types of query. Alternate formats can be selected using the output URL parameter, e.g. output=json:

General Patterns

The following sections provide a number of useful query patterns that illustrate some basic ways to query the BNB.

Discovering URIs

One very common use case when working with a SPARQL endpoint is the need to discover the URI for a resource. For example, the ISBN number for a book or an ISSN number of a serial is likely to be found in a wide variety of databases. It would be useful to be able to use those identifiers to look up the corresponding resource in the BNB.

Here’s a simple SELECT query that looks up a book based on its ISBN-10:

#Declare a prefix for the bibo schema
PREFIX bibo: <>
  #Match any resource that has the specific property and value
  ?uri bibo:isbn10 "0261102214".

As can be seen from executing this query there are actually 4 different editions of The
that have been published using this ISBN.

Here is a variation of the same query that identifies the resource with an ISSN of 1356-0069:

PREFIX bibo: <>
  ?uri bibo:issn "1356-0069".

The basic query pattern is the same in each case. Resources are matched based on the value of a literal property. To find different resources just substitute in a different value or match on a different property. The results can be used in further queries or used to access the BNB Linked Data by performing a GET request on the URI.

In some cases it may just be useful to know whether there is a resource that has a matching identifier in the dataset. An ASK query supports this use case. The following query should return true as there is a resource in the BNB with the given ISSN:

PREFIX bibo: <>
  ?uri bibo:issn "1356-0069".

Note ASK queries no longer seem to be supported by the BL SPARQL endpoint. All ASK queries now return an error

Extracting Data Using Identifiers

Rather than just request a URI or list of URIs it would be useful to extract some additional attributes of the resources. This is easily done by extending the query pattern to include more properties.

The following example extracts the URI, title and BNB number for all books with a given ISBN:

#Declare some additional prefixes
PREFIX bibo: <>
PREFIX blterms: <>
PREFIX dct: <>

SELECT ?uri ?bnb ?title WHERE {
  #Match the books by ISBN
  ?uri bibo:isbn10 "0261102214";
       #bind some variables to their other attributes
       blterms:bnb ?bnb;
       dct:title ?title.

This patterns extends the previous examples in several ways. Firstly, some additional prefixes are declared because the properties of interest are from several different schemas. Secondly, the query pattern is extended to match the additional attributes of the resources. The values of those attributes are bound to variables. Finally the SELECT clause is extended to list all the variables that should be returned.

If the URI for is already known then this can be used to directly identify the resource of interest. Its properties can then be matched and extracted. The following query returns the ISBN, title and BNB number for a specific book:

PREFIX bibo: <>
PREFIX blterms: <>
PREFIX dct: <>

SELECT ?isbn ?title ?bnb WHERE {
  <> bibo:isbn10 ?isbn;
       blterms:bnb ?bnb;
       dct:title ?title.         

Whereas the former query identified resources indirectly, via the value of an attribute, this query directly references a resource using its URI. The query pattern then matches the properties that are of interest. Matching resources by URI is usually much faster than matching based on a literal property.

Itemising all of the properties of a resource can be tiresome. Using SPARQL it is possible to ask the SPARQL endpoint to generate a useful summary of a resource (called a Bounded Description. The endpoint will typically return all attributes and relationships of the resource. This can be achieved using a simple DESCRIBE query:


The query doesn’t need to define any prefixes or match any properties: the endpoint will simply return what it knows about a resource as RDF. If RDF/XML isn’t useful then the same results can be retrieved as JSON.

Reverting back to the previous approach of indirectly identifying resources, its possible to ask the endpoint to generate descriptions of all books with a given ISBN:

PREFIX bibo: <>
  ?uri bibo:isbn10 "0261102214".

Matching By Relationship

Resources can also be matched based on their relationships, by traversing across the graph of data. For example it’s possible to lookup the author for a given book:

PREFIX bibo: <>
PREFIX dct: <>

SELECT ?author WHERE {
  #Match the book
  ?uri bibo:isbn10 "0261102214";
       #Match its author
       dct:creator ?author.

As there are four books with this ISBN the query results return the URI for Tolkien four times. Adding a DISTINCT will remove any duplicates:

PREFIX bibo: <>
PREFIX dct: <>

  #Match the book
  ?uri bibo:isbn10 "0261102214";
       #Match its author
       dct:creator ?author.

Type Specific Patterns

The following sections provide some additional example queries that illustrate some useful queries for working with some specific types of resource in the BNB dataset. Each query is accompanied by links to the SPARQL endpoint that show the results.

For clarity the PREFIX declarations in each query have been ommited. It should be assumed that each query is preceded with the following prefix declarations:

PREFIX bio: <;
PREFIX bibo: <;
PREFIX blterms: <;
PREFIX dct: <;
PREFIX event: <;
PREFIX foaf: <;
PREFIX geo: <;
PREFIX isbd: <;
PREFIX org: <;
PREFIX owl: <;
PREFIX rda: <;
PREFIX rdf: <;
PREFIX rdfs: <;
PREFIX skos: <;
PREFIX xsd: <;

Not all of these are required for all of the queries, but they declare all of the prefixes that are likely to be useful when querying the BNB.


There are a number of interesting queries that can be used to interact with author data in the BNB.

List Books By An Author

The following query lists all published books written by C. S. Lewis, with the most recently published books returned first:

SELECT ?book ?isbn ?title ?year WHERE {
  #Match all books with Lewis as an author
  ?book dct:creator <>;
        bibo:isbn10 ?isbn;
        dct:title ?title;
        #match the publication event
        blterms:publication ?publication.

  #match the time of the publication event
  ?publication event:time ?time.
  #match the label of the year
  ?time rdfs:label ?year          
#order by descending year, after casting year as an integer
ORDER BY DESC( xsd:int(?year) )

Identifying Genre of an Author

Books in the BNB are associated with one or more subject categories. By looking up the list of categories associated with an author’s works it may be possible to get a sense of what type of books they have written. Here is a query that returns the list of categories associated with C. S Lewis’s works:

SELECT DISTINCT ?category ?label WHERE {
  #Match all books with Lewis as an author
  ?book dct:creator <>;
     dct:subject ?category.

  ?category rdfs:label ?label.     
ORDER BY ?label

Relationships Between Contributors

The following query extracts a list of all people who have contributed to one or more C. S. Lewis books:

  ?book dct:creator <>;
     dct:contributor ?author.

  ?author foaf:name ?name.     

  FILTER (?author != <>) 
ORDER BY ?name

Going one step further its possible to identify people that serve as connections between different authors. For example this query finds people that have contributed to books by both C. S. Lewis and J. R. R. Tolkien:

  ?book dct:creator <>;
     dct:contributor ?author.

  ?otherBook dct:creator <>;
     dct:contributor ?author.

  ?author foaf:name ?name.     
ORDER BY ?name

Authors Born in a Year

The basic biographical information in the BNB can also be used in queries. For example many authors have a recorded year of birth some a year of death. These are described as Birth or Death Events in the data. The following query illustrates how to find 50 authors born in 1944:

SELECT ?author ?name WHERE {
   ?event a bio:Birth;
      bio:date "1944"^^<>.

   ?author bio:event ?event;
      foaf:name ?name.

The years associated with Birth and Death events have am XML Schema datatype associated with them (xsd:Year). It is important to specific this type in the query, otherwise the query will fail to match any data.


There are a large number of published works in the BNB, extracting useful subsets involves identifying some useful dimensions in the data that can be used to filter the results. In addition to finding books by an author there are some other useful facets that relate to books, including:

  • Year of Publication
  • Location of Publication
  • Publisher

The following sections include queries that extract data along these dimensions. In each case the key step is to match the Publication Event associated with the book.

Books Published in a Year

Publication Events have a “time” relationship that refers to a resource for the year of publication. The following query extracts 50 books published in 2010:

SELECT ?book ?isbn ?title ?year WHERE {
  ?book dct:creator ?author;
        bibo:isbn10 ?isbn;
        dct:title ?title;
        #match the publication event
        blterms:publication ?publication.

  #match the time of the publication event
  ?publication event:time ?time.
  #match the label of the year
  ?time rdfs:label "2010"      

Books Published in a Location

Finding books based on their place of publication is a variation of the above query. Rather than matching the time relationship, the query instead looks for the location associated with the publication event. This query finds 50 books published in Bath:

SELECT ?book ?isbn ?title ?year WHERE {
  ?book dct:creator ?author;
        bibo:isbn10 ?isbn;
        dct:title ?title;
        blterms:publication ?publication.

  ?publication event:place ?place.
  ?place rdfs:label "Bath"

Books From a Publisher

In addition to the time and place relationships, Publication Events are also related to a publisher via an “agent” relationship. The following query uses a combination of the time and agent relationships to find 50 books published by Allen & Unwin in 2011:

SELECT ?book ?isbn ?title ?year WHERE {
  ?book dct:creator ?author;
        bibo:isbn10 ?isbn;
        dct:title ?title;
        blterms:publication ?publication.

  ?publication event:agent ?agent;
       event:time ?time.

  ?agent rdfs:label "Allen & Unwin".
  ?time rdfs:label "2011".

These queries can easily be adapted to extend and combine the query patterns further, e.g. to limit results by a combination of place, time and publisher, or along different dimensions such as subject category.


The BNB includes nearly 20,000 book series. The following queries illustrate some useful ways to interact with that data.

Books in a Series

Finding the books associated with a specific series is relatively straight-forward. The following query is very similar to an earlier query to find books based on an author. However in this case the list of books to be returned is identified by matching those that have an “has part” relationship with a series. The query finds books that are part of the “Pleasure In Reading” series:

SELECT ?book ?isbn ?title ?year WHERE {
  <> dct:hasPart ?book.

  ?book dct:creator ?author;
        bibo:isbn10 ?isbn;
        dct:title ?title;
        blterms:publication ?publication.

  ?publication event:agent ?agent;
       event:time ?time.

  ?time rdfs:label ?year.

Categories for a Series

The BNB only includes minimal metadata about each series: just a name and a list of books. In order to get a little more insight into the type of book included in a series, the following query finds a list of the subject categories associated with a series:

  <> dct:hasPart ?book.

  ?book dct:subject ?subject.

  ?subject rdfs:label ?label.

As with the previous query the “Pleasure in Reading” series is identified by its URI. As books in the series might share a category the query uses the DISTINCT keyword to filter the results.

Series Recommendation

A series could be considered as a reading list containing useful suggestions of books on particular topics. One way to find a reading list might be to find lists based on subject category, using a variation of the previous query.

Another approach would be to find lists that already contain works by a favourite author. For example the following query finds the URI and the label of all series that contain books by J. R. R. Tolkien:

SELECT DISTINCT ?series ?label WHERE {
  ?book dct:creator ?author.
  ?author foaf:name "J. R. R. Tolkien".

  ?series dct:hasPart ?book;
     rdfs:label ?label.



The rich subject categories in the BNB data provide a number of useful ways to slice and dice the data. For example it is often useful to just fetch a list of books based on their category. The following query finds a list of American Detective and Mystery books:

SELECT ?book ?title ?name WHERE {

   ?book dct:title ?title;
         dct:creator ?author;
         dct:subject <>.

  ?author foaf:name ?name.
ORDER BY ?name ?title

For common or broad categories these lists can become very large so filtering them down further into more manageable chunks may be necessary.


Many of the periodicals and newspapers published in the UK have a local or regional focus. This geographical relationship is recorded in the BNB via a “spatial” relationship of the serial resource. This relationship supports finding publications that are relevant to a particular location in the United Kingdom.

The following query finds serials that focus on the City of Bath:

SELECT ?title ?issn WHERE {

   ?serial dct:title ?title;
           bibo:issn ?issn;
           dct:spatial ?place.

   ?place rdfs:label "Bath (England)".

The exact name of the location is used in the match. While it would be possible to filter the results based on a regular expression, this can be very slow. The following query shows how to extract a list of locations referenced from the Dublin Core spatial relationship. This list could be used to populate a search form or application navigation to enable efficient filtering by place name:


   ?serial dct:spatial ?place.
   ?place rdfs:label ?label.


This tutorial has provided an introduction to using SPARQL to extract data from the BNB dataset. When working with a SPARQL endpoint it is often useful to have example queries that can be customised to support particular use cases. The tutorial has included multiple examples and these are all available to download.

The tutorial has covered some useful general approaches for matching resources based on identifiers and relationships. Looking up URIs in a dataset is an important step in mapping from systems that contain non-URI identifiers, e.g. ISSNs or ISBNs. Once a URI has been discovered it can be used to directly access the BNB Linked Data or used as a parameter to drive further queries.

A number of example queries have also been included showing how to ask useful and interesting questions of the dataset. These queries relate to the main types of resources in the BNB and illustrate how to slice and dice the dataset along a number of different dimensions.

While the majority of the sample queries are simple SELECT queries, it is possible to create variants that use CONSTRUCT or DESCRIBE queries to extract data in other ways. Several good SPARQL tutorials have been referenced to provide further background reading for developers interested in digging into this further.


Loading the British National Bibliography into an RDF Database

This is the second in a series of posts (1, 2, 3, 4) providing background and tutorial material about the British National Bibliography. The tutorials were written as part of some freelance work I did for the British Library at the end of 2012. The material was used as input to creating the new documentation for their Linked Data platform but hasn’t been otherwise published. They are now published here with permission of the BL.

Note: while I’ve attempted to fix up these instructions to account with changes to the software and how the data is published, there may still be some errors. If there are then please leave a comment or drop me an email and I’ll endeavour to fix.

The British National Bibliography (BNB) is a bibliographic database that contains data on a wide range of books and serial publications published in the UK and Ireland since the 1950s. The database is published under a public domain license and is available for access online or as a bulk download.

This tutorial provides developers with guidance on how to download the BNB data and load it into an RDF database, or “triple store” for local processing. The tutorial covers:

  • An overview of the different formats available
  • How to download the BNB data
  • Instructions for loading the data into two different open source triple stores

The instructions given in this tutorial are for users of Ubuntu. Where necessary pointers to instructions for other operating systems are provided. It is assumed that the reader is confident in downloading and installing software packages and working with the command-line.

Bulk Access to the BNB

While the BNB is available for online access as Linked Data and via a SPARQL endpoint there are a number of reasons why working with the dataset locally might be useful, e.g:

  • Analysis of the data might require custom indexing or processing
  • Using a local triple store might offer more performance or functionality
  • Re-publishing the dataset as part of aggregating data from a number of data providers
  • The full dataset provides additional data which is not included in the Linked Data.

To support these and other use cases the BNB is available for bulk download, allowing developers the flexibilty to process the data in a variety of ways.

The BNB is actually available in two different packages. Both provide exports of the data in RDF but differ in both the file formats used and the structure of the data.

BNB Basic

The BNB Basic dataset is provided as an export in RDF/XML format. The individual files are available for download from the BL website.

This version provides the most basic export of the BNB data. Each record is mapped to a simple RDF/XML description that uses terms from several schemas including Dublin Core, SKOS, and Bibliographic Ontology.

As its provides a fairly raw version of the data, BNB Basic is likely to be most useful when the data is going to undergo further local conversion or analysis.

Linked Open BNB

The Linked Open BNB offers a much more structured view of the BNB data.

This version of the BNB has been modelled according to Linked Data principles:

  • Every resource, e.g. author, book, category, has been given a unique URI
  • Data has been modelled using a wider range of standard vocabularies, including the Bibliographic Ontology, Event Ontology and FOAF.
  • Where possible the data has been linked to other datasets, including LCSH and Geonames

It is this version of the data that is used to provide both the SPARQL endpoint and the Linked Data views, e.g. of The Hobbit.

This package provides the best option for mirroring or aggregating the BNB data because its contents matches that of the online versions. The additional structure to the dataset may also make it easier to work with in some cases. For example lists of unique authors or locations can be easily extracted from the data.

Downloading The Data

Both the BNB Basic and Linked Open BNB are available for download from the BL website

Each dataset is split over multiple zipped files. The BNB Basic is published in RDF/XML format while the Linked Open BNB is published as ntriples. The individual data files can be downloaded from CKAN although this can be time consuming to do manually.

The rest of this tutorial will assume that the packages have been downloaded to ~data/bl

Unpacking the files is a simple matter of unzipping them:

cd ~/data/bl
unzip \*.zip
#Remove original zip files
rm *.zip

The rest of this tutorial provides guidance on how to load and index the BNB data in two different open source triple stores.

Using the BNB with Fuseki

Apache Jena is an Open Source project that provides access to a number of tools and Java libraries for working with RDF data. One component of the project is the Fuseki SPARQL server.

Fuseki provides support for indexing and querying RDF data using the SPARQL protocol and query language.

The Fuseki documentation provides a full guide for installing and administering a local Fuseki server. The following sections provide a short tutorial on using Fuseki to work with the BNB data.


Firstly, if Java is not already installed then download the correct version for your operating system.

Once Java has been installed, download the latest binary distribution of Fuseki. At the time of writing this is Jena Fuseki 1.1.0.

The steps to download and unzip Fuseki are as follows:

#Make directory
mkdir -p ~/tools
cd ~/tools

#Download latest version using wget (or manually download)


Change the download URL and local path as required. Then ensure that the fuseki-server script is executable:

cd jena-fuseki-1.1.0
chmod +x fuseki-server

To test whether Fuseki is installed correctly, run the following (on Windows systems use fuseki-server.bat):

./fuseki-server --mem /ds

This will start Fuseki with a empty read-only in-memory database. Visiting http://localhost:3030/ in your browser should show the basic Fuseki server page. Use Ctrl-C to shutdown the server once the installation test is completed.

Loading the BNB Data into Fuseki

While Fuseki provides an API for loading RDF data into a running instance, for bulk loading it is more efficient to index the data separately. The manually created indexes can then be deployed by a Fuseki instance.

Fuseki is bundled with the TDB triple store. The TDB data loader can be run as follows:

java -cp fuseki-server.jar tdb.tdbloader --loc /path/to/indexes file.nt

This command would create TDB indexes in the /path/to/indexes directory and load the file.nt into it.

To index all of the Linked Open BNB run the following command, adjusting paths as required:

java -Xms1024M -cp fuseki-server.jar tdb.tdbloader --loc ~/data/indexes/bluk-bnb ~/data/bl/BNB*

This will process each of the data files and may take several hours to complete depending on the hardware being used.

Once the loader has completed the final step is to generate a statistics file for the TDB optimiser. Without this file SPARQL queries will be very slow. The file should be generated into a temporary location and then copied into the index directory:

java -Xms1024M -cp fuseki-server.jar tdb.stats --loc ~/data/indexes/bluk-bnb >/tmp/stats.opt
mv /tmp/stats.opt ~/data/indexes/bluk-bnb

Running Fuseki

Once the data load has completed Fuseki can be started and instructed to use the indexes as follows:

./fuseki-server --loc ~/data/indexes/bluk-bnb /bluk-bnb

The --loc parameter instructs Fuseki to use the TDB indexes from a specific directory. The second parameter tells Fuseki where to mount the index in the web application. Using a mount point of /bluk-bnb the SPARQL endpoint for the dataset would then be found at:


To select the dataset and work with it in the admin interface visit the Fuseki control panel:


Fuseki has a basic SPARQL interface for testing out SPARQL queries, e.g. the following will return 10 triples from the data:

SELECT ?s ?p ?o WHERE {
  ?s ?p ?o

For more information on using and administering the server read the Fuseki documentation.

Using the BNB with 4Store

Like Fuseki, 4Store is an Open Source project that provides a SPARQL based server for managing RDF data. 4Store is written in C and has been proven to scale to very large datasets across multiple systems. It offers a similar level of SPARQL support as Fuseki so is good alternative for working with RDF in a production setting.

As the 4Store download page explains, the project has been packaged for a number of different operating systems.


As 4Store is available as an Ubuntu package installation is quite simple:

sudo apt-get install 4store

This will install a number of command-line tools for working with the 4Store server. 4Store works differently to Fuseki in that there are separate server processes for managing the data and serving the SPARQL interface.

The following command will create a 4Store database called bluk_bnb:

#ensure /var/lib/4store exists
sudo mkdir -p /var/lib/4store

sudo 4s-backend-setup bluk_bnb

By default 4Store puts all of its indexes in /var/lib/4store. In order to have more control over where the indexes are kept it is currently necessary to build 4store manually. The build configuration can be altered to instruct 4Store to use an alternate location.

Once a database has been created, start a 4Store backend to manage it:

sudo 4s-backend bluk_bnb

This process must to be running before data can be imported, or queried from the database.

Once the database is running a SPARQL interface can then be started to provide access to its contents. The following command will start a SPARQL server on port 8000:

sudo 4s-httpd -p 8000 bluk_bnb

To check whether the server is running correctly visit:


It is not possible to run a bulk import into 4Store while the SPARQL process is running. So after confirming that 4Store is running successfully, kill the httpd process before continuing:

sudo pkill '^4s-httpd'

Loading the Data

4Store ships with a command-line tool for importing data called 4s-import. It can be used to perform bulk imports of data once the database process has been started.

To bulk import the Linked Open BNB, run the following command adjusting paths as necessary:

4s-import bluk_bnb --format ntriples ~/data/bl/BNB*

Once the import is complete, restart the SPARQL server:

sudo 4s-httpd -p 8000 bluk_bnb

Testing the Data Load

4Store offers a simple SPARQL form for submitting queries against a dataset. Assuming that the SPARQL server is running on port 8000 this can be found at:


Alternatively 4Store provides a command-line tool for submitting queries:

 4s-query bluk_bnb 'SELECT * WHERE { ?s ?p ?o } LIMIT 10'


The BNB dataset is not just available for use as Linked Data or via a SPARQL endpoint. The underlying data can be downloaded for local analysis or indexing.

To support this type of usage the British Library have made available two versions of the BNB. A “basic” version that uses a simple record-oriented data model and the “Linked Open BNB” which offers a more structured dataset.

This tutorial has reviewed how to access both of these datasets and how to download and index the data using two different open source triple stores: Fuseki and 4Store.

The BNB data could also be processed in other ways, e.g. to load into a standard relational database or into a document store like CouchDB.

The basic version of the BNB offers a raw version of the data that supports this type of usage, while the richer Linked Data version supports a variety of aggregation and mirroring use cases.


An Introduction to the British National Bibliography

This is the first of a series of posts (1, 2, 3, 4) providing background and tutorial material about the British National Bibliography. The tutorials were written as part of some freelance work I did for the British Library at the end of 2012. The material was used as input to creating the new documentation for their Linked Data platform but hasn’t been otherwise published. They are now published here with permission of the BL.

This tutorial provides an introduction to the British National Bibliography (BNB) for developers interested in understanding how it could be used in their applications. The tutorial provides:

  • A short introduction to the BNB, including its scope and contents
  • A look at how the BNB can be accessed online
  • An introduction to the data model that underpins the BNB Linked Data

What is the British National Bibliography?

The British National Bibliography has been in development for over 60 years with the first record added in 1950. It contains data about virtually every book and journal title published or distributed in the UK and Ireland since that date. In its role as the national library of the United Kingdom the British Library is responsible for preserving a wide variety of works, and the majority of these are catalogued in the BNB. The exclusions largely relate to some official government publications that are catalogued elsewhere, or other locally published or ephemeral material. With an increasing number of works being published electronically, in 2003 the BNB was extended to include records for UK online electronic resources. The 2013 regulations extended the scope further to include non-print materials.

As well as being an historical archive the BNB also includes data about forthcoming publications, in some cases up to 16 weeks in advance of their actual publication dates. There are over 50,000 new titles published each year in the UK and Ireland, which gives an indication of how quickly the database is growing.

Traditionally the BNB has had a role in helping publishers share metadata with libraries, to reduce the costs of cataloguing works and to inform purchasing decisions. But with the publication of the BNB under an open license, it is now available for anyone to use in a variety of ways. For example the database could be used as:

  • A reliable source of book metadata, e.g. to drive reading list or personal cataloguing applications
  • Insight into the publication output of the UK over different years and genres
  • A means of accessing bibliographies of individual authors, ranging over 60 years

Accessing the BNB Open Data

There are several ways in which the BNB can be accessed. While the online search interface provides a handy way to explore the contents of the bibliography to understand its scope, for application development a machine-readable interface is required.

Originally the primary way to access the BNB was via Z39.50: a specialised protocol for searching remote databases that is used in many library systems. However the BNB is now available through via several other routes that make is easier to use in other contexts. These include:

  • Bulk downloads of the database in RDF/XML format. This includes the full dataset and is intended to support local indexing and analysis
  • Online access as Linked Data, allowing individual records to be accessed in a variety of formats including XML and JSON. This is a subset that includes books and serials only
  • An API that allows the dataset to be queried using the SPARQL query language. This provides a number of ways of querying and extracting portions of the dataset

These different access methods support a range of use cases, e.g. allowing the BNB to be accessed in its entirety to support bulk processing and analysis, whilst also supporting online access from mobile or web applications.

Most importantly the BNB is available for use under an open license. The British Library have chosen to publish the data under a Creative Commons CC0 License which places the entire database into the public domain. Unlike some online book databases or APIs this means there are no restrictions whatsoever on how the BNB can be used.

The Linked Data version of the BNB is the most structured version of the BNB and is likely to be the best starting point for most applications. The same data and data model is used to power the SPARQL endpoint, so understanding its structure will help developers use the API. The Linked Data has also been cross-referenced with other data sources, offering additional sources of information. The rest of this tutorial therefore looks at this data model in more detail.

The BNB Linked Data

Linked Data is a technique for publishing data on the web. Every resource, e.g. every book, author, organisation or subject category, is assigned a URI which becomes its unique identifier. Accessing that URI in a web browser will result in the delivery of a web page that contains a useful summary of the specific resource including, for example, key attributes such as its name or title, and also relationships to other resources e.g. a link from a book to its author.

But the same URI can also be accessed by an application. But, instead of an HTML web page, the data can be retrieved in a variety of different machine-readable formats, including XML and JSON. With Linked Data a web site is also an API. Information can be accessed by both humans and application code in whatever format is most useful.

A database published as Linked Data can be thought of as containing:

  • Resources, which are identified by a URI
  • Attributes of those resources, e.g. names, titles, labels, dates, etc
  • Relationships between resources, e.g. author, publisher, etc.

One special relationship identifies the type of a resource. Depending on the data model, a resource might have multiple types.

Standard Schemas

The data model, or schema, used to publish a dataset as Linked Data consists of a set of terms. Terms are either properties, e.g. attributes or relationships between resources, or types of things, e.g. Book, Person, etc.

Unique URIs are not only used identify resources, they’re also used to identify terms in a schema. For example the unique identifier for the “title” attribute in the BNB is, whereas the unique identifier for the Person type is

By identifying terms with unique URIs it becomes possible to publish and share their definitions on the web. This supports re-use of schemas, allowing datasets from different organisations to be published using the same terms. This encourages convergence on standard ways to describe particular types of data, making it easier for consumers to use and integrate data from a number of different sources.

The BNB dataset makes use of a number of standard schemas. These are summarised in the following table along with their base URI and links to their individual documentation.

Name URI Role
Bibliographic Ontology A rich vocabulary for describing many types of publication and their related metadata
Bio Contains terms for publishing biographical information
British Library Terms A new schema published by the British Library which contains some terms not covered in the other vocabularies
Dublin Core Basic bibliographic metadata terms like title and creator
Event Ontology Properties for describing events and their participants
FOAF Contains terms for describing people, their names and relationships
ISBD Terms from the International Standard Bibliographic Description standard
Org Contains terms for describing organisations
Web Ontology Language A standard ontology for describing terms and equivalencies between resources
RDF Schema The core RDF schema language which is used to publish new terms
Resource Description and Access Defines some standard library cataloguing terms
SKOS Supports publication of subject classifications and taxonomies
WGS84 Geo Positioning Geographic points, latitude and longitude

Returning to our earlier example we can now see that the title attribute in the BNB dataset ( is taken from the Dublin Core Schema ( as they share a common base URI. Similarly the Person type is taken from the FOAF Schema.

It is common practice for datasets published as Linked Data to use terms from multiple schemas. But, while a dataset might mix together several schemas, it is very unlikely that it will use all of the terms from all of the schemas. More commonly only a few terms are taken from each schema.

So, while it is useful to know which schemas have been used in compiling a dataset, it is also important to understand how those schemas have been used to describe specific types of resource. This is covered in more detail in the next section.

The BNB Data Model

There are high level overview diagrams that show the main types of resources and relationships in the BNB dataset. One diagram summarises the data model for books while another summarises the model for serials (e.g periodicals and newspapers).

The following sections add some useful context to those diagrams, highlighting how the most important types of resources in the dataset are described. The descriptions include a list of the attributes and relationships that are commonly associated with each individual type.

The tables are not meant to be exhaustive documentation, but instead highlight the most common or more important properties. The goal is to help users understand the structure of the dataset and the relationships between resources. Further exploration is encouraged. With this in mind, links to example resources are included throughout.

It is also important to underline that not all resources will have all of the listed properties. The quality or availability of data might vary across different publications. Similarly a resource might have multiple instances of a given attribute or relationship. E.g. a book with multiple authors.

Finally, all resources will have an RDF type property ( and the values of this property are given in each section. As noted above, a resource may have multiple types.


Unsurprisingly, books are one of the key types of resource in the BNB dataset. The important bibliographic metadata for a book is catalogued, including its title, language, number of pages, unique identifiers such as its ISBN and the British Bibliography Number, and references to its author, publisher, the event of its publication, and subject classifications.

Books are identified with an RDF type of The following table summarises the properties most commonly associated with those resources:

Property URI Notes
Abstract A short abstract or summary of the book
BNB The British Bibliographic Number for the book
Creator Reference to the author(s) of the book; a Person resource
Contributor Reference to other people involved in the creation of the work, e.g. an editor or illustrator.
Extent The “extent” of the book, i.e. the number of pages it contains
ISBN 10/13; 10-digit and 13-digit [ISBN]( number of the book
Language The language of the text
Publication Event Reference to an Event resource describing the year and location in which the book was
Subject Reference to concept resources that describe the subject category of the book
Table of Contents Text from the table of contents page
Title The title of the book

In some cases there may be multiple instances of these properties. For example a book might have several creators or be associated with multiple subject categories.

The Hobbit makes a good example. There was an edition of the book published in 1993 by Harper Collins. The edition has an ISBN of 0261102214. If you visit the Linked Page page for the resource you can view a description of the book. To get access to the raw data, choose one of the alternate formats, e.g. JSON or XML.


The BNB database includes some basic biographical and bibliographic information about people, e.g. authors, illustrators, etc. These resources all have an RDF type of The description of a person will typically include both family and given names and, if available, reference to birth and death events.

A person will also be associated with one or more books or other works in the database. A person is either the creator or a contributor to a work. The creator relationship is used to identify a significant contribution, e.g. the author of a book. The contributor relationship covers other forms of contribution, e.g. editor, illustrator, etc.

The following table identifies the individual properties used to describe people:

Property URI Notes
Created Reference to a work which the person created
Contributed To Reference to a work to which the Person has contributed
Event Reference to an Event resource involving the author. Usually a birth event and/or a death event
Family Name The surname of the author
Given Name The first name of the author
Name The full name of the author

C. S. Lewis was a prolific author. Visiting the description of Lewis in the BNB database provides a bibliography for Lewis, listing the various works that he authored or those to which he contributed. There are also reference to his birth and death events.

Pauline Baynes was an illustrator who worked with a number of authors, including both Lewis and Tolkein. Baynes’s description in the BNB includes a list of all her contributions (many more than are mentioned on her Wikipedia page). Baynes provides one of many connections between Lewis and Tolkein in the BNB database, via the contributor relationships with their works.

Events, Places and Organizations

An event is something that happens at a particular point in time, involves one or more participants and usually occurs in a specific location. Book publications, births and deaths are all modelled as events in the BNB data. Each type of event has a different RDF type:

The publication of an edition of the Hobbit is an example of a Publication Event, whilst Tolkein’s birth and death illustrate the basic biographical detail associated with those events.

The following table summarises the key attributes of an event:

Property URI Notes
Agent Used to refer to an Organization involved in a Publication Event
Date The year in which a birth or death took place. In this property the year is captured as a plain text value.
Place Reference to a Place resource which describes the location in which a Publication Event took place. This will either be a resource in the BNB or the Geonames dataset (see “Links to Other Datasets”)
Time The year in which a Publication Event took place. The value is a reference to a resource that describes the year. The URIs are taken from an official UK government dataset.

As noted above, Publication Events often refer to two other types of resource in the dataset:

While these resources provide minimal extra information in the BNB dataset they have been created to allow other organisations to link their data to the BNB


The different types of bibliographic resource in the BNB can all be associated with subject categories that help to further describe them, e.g. to indicate their subject matter, theme or genre. The Dubline Core subject property ( is used to associate a resources with one or more categories.

Individual subject categories are organised into a “scheme”. A scheme is a set of formally defined categories that has been published by a particular authority. The BNB data uses schemes from the Library of Congress and the Dewey Decimal Classification. These schemes provide a standard way to catalogue works that are in use in many different systems.

The BNB data uses several different RDF types to identify different types of category, e.g. to differentiate between categories used as topics, place labels, etc. The BNB data model diagram illustrates some of the variety of subject resources that can be found in the dataset.

However while the categories may come from different sources or be of different types, the data about each is essentially the same:

Property URI Notes
Label A label for the category
Notation A formal identifier for the category, e.g. the Dewey Decimal Number
Scheme A reference to the scheme that the concept is associated with. This may be a resource defined in another dataset

Examples of different types of category include:

  • Fiction in English and Childrens Stories. These are both taken from the Library of Congress Subject Headings (LCSH).
  • Dewey Decimal Classifications, e.g. 823.91 which is “English fiction–20th century” in the Dewey system.


Some books in the BNB are not just organised into categories: they are also organised into collections or “series”. Series are collections of books that are usually based around a specific theme or topic. The BNB contains over 190,000 different series covering a wide range of different topics. Series are identified with an RDF type of

Examples include “Men and Machines“, “Science series” and “Pleasure In Reading“. Series provide ready made reading lists on a particular topic that could be used to drive recommendations of books to readers.

Series are essentially just lists of resources and so are described with just a few properties:

Property URI Notes
Label The name of the series
Part A reference to the bibliographic resource that is included in the collection. A series will have many instances of this property, one for each work in the series

Periodical and Newspapers

The coverage of the BNB goes beyond just books. It also has data about a number of serial publications:

Descriptions of these resources share many similarities with books, e..g title, BNB numbers, subject classifications, etc. However there are several additional properties that are used to specific to serial publications. This includes notes on where and when they were first published, regional focus, etc.

In total there are approximately 10,000 different periodicals in the BNB data. The periodicals may be related to one another, e.g. for example one publication can replace another.

The following table lists some of the key properties of periodicals and newspapers. The serial data model diagram further illustrates some of the key relationships.

Property URI Notes
Alternate Title Alternative title(s) for the publication
BNB The British Bibliographic Number for the serial
Contributor A relationship to a Person or Organization that contributed to the publication
Frequency A note on the publication frequency, e.g. “Weekly”
ISSN The official [ISSN]( number for the serial
Language The language of the text
Publication Event Reference to an Event resource describing the year and location in which the serial was first published
Replaced By Reference to a periodical that replaces or supercedes this one
Replaces Reference to another periodical that this resource replaces
Resource Specific Note Typically contains notes on the start (and end) publication dates of the periodical
Spatial Reference to Place resources that indicate the geographical focus of a periodical
Subject Reference to Concept resources that describe the subject category of the serial
Title The title of the serial

Examples of periodicals include the Whitley Bay News Guardian, the Bath Chronicle and iSight.

The publication Coaching News was replaced by Cycle Coaching, providing an example of direct relationships between publications. As noted in the serial data model diagram other relationships from the Dublin Core vocabulary are used to capture alternate formats and versions of publications.

Links to Other Datasets

The final aspect of the BNB dataset to highlight is its relationships with other datasets. When publishing Linked Data it is best practice to include links or references to other datasets. There are two main forms that this cross-referencing can take:

  • Declaring equivalence links to indicate that a resource in the current data is the same as another resource (with a different identifier) in a different dataset. Publishing this equivalencies help to integrate datasets across the web. It is achieved by using the OWL “Same As” property ( to relate together the two resources.
  • In other cases resources are simply directly referenced as the value of a relationship. For example references to geographical places or subject categories may be made directly to resources in third-party datasets. This avoids creating new descriptions of the same resource.

In both cases the links between the datasets can be followed, by a user or an application, in order to discover additional useful data. For example BNB includes references to places in the GeoNames dataset. Following those links can help an application discover the latitude and longitude of the location.

The BNB uses both of these forms of cross-referencing to create links to a number of different datasets:

All of these datasets contain additional useful contextual data that might be useful for application developers to explore.


This tutorial has aimed to provide an introduction to the scope, contents and structure of the British National Bibliography.

Starting with some introductory material that briefly summarised the history of the dataset, the various means of accessing the data were then summarised. The dataset contains over 60 years worth of data which has been placed into the public domain, making it freely available for developers to use as they see fit. The data can be downloaded for bulk processing or used online as Linked Data or via a SPARQL endpoint.

The main focus of the tutorial has been on providing an overview of the key types of resource in the dataset, complete with a summary of their key attributes and relationships. Example resources that highlight important features have been given throughout, to help provide useful pointers for further exploration.

The British National Bibliography is an important national resource that has the potential to be used as a key piece of infrastructure in a range of different types of application.


Thoughts on the Netflix API Closure

A year ago Netflix announced that they were shuttering their public API: no new API keys or affiliates and no more support. Earlier this week they announced that the entire public API will be shutdown by November 2014.

This is interesting news and its been covered in various places already, including this good overview at Programmable Web. I find it  interesting because its the first time that I can recall an public API being so visibly switched out for a closed, private alternative. Netflix will still offer an API but only for a limited set of eight existing affiliates and (of course) their own applications. Private APIs have always existed and will continue to do so, but the trend to date has been about these being made public, rather than a move in the opposite direction.

It’s reasonable to consider if this might be the first of a new trend, or whether its just an outlier. Netflix have been reasonably forthcoming about their API design decisions so I expect many others will be reflecting on their decision and whether it would make sense for them.

But does it make sense at all?

If you read this article by Daniel Jacobson (Director of Engineering for the Netflix API) you can get more detail on the decision and some insight into their thought process. By closing the public API and focusing on a few affiliates Jacobson suggests that they are able to optimise the API to fit the needs of those specific consumers. The article suggests that a fine-grained resource-oriented API is excellent for supporting largely un-mediated use by a wide range of different consumers with a range of different use cases. In contrast an API that is optimised for fewer use cases and types of query may be able to offer better performance. An API with a smaller surface area will have lower maintenance overheads. Support overheads will also be lower because there’s few interactions to consider and a smaller user base making them.

That rationale is hard to argue with from either a technical or business perspective. If you have a small number of users driving most of your revenue and a long tail of users generating little or no revenue but with a high support code, it mostly makes sense to follow the revenue. I don’t buy all of the technical rationale though. It would be possible to support a mixture of resource types in the API, as well as a mixture of support and service level agreements. So I suspect the business drivers are the main rationale here. APIs have generally meant businesses giving up control, if Netflix are able to make this work then I would be surprised if more business don’t do the same eventually, as a means to regain that control.

But by withdrawing from any kind of public API Netflix are essentially admitting that they don’t see any further innovation happening around their API: what they’ve seen so far is everything they’re going to see. They’re not expecting a sudden new type of usage to drive revenue and users to the service. Or at least not enough to warrant maintaining a more generic API. If they felt that the community was growing, or building new and interesting applications that benefited their business, they’d keep the API open. By restricting it they’re admitting that closer integration with a small number of applications is a better investment. It’s a standard vertical integration move that gives them greater control over all user experience with their platform. It wouldn’t surprise me if they acquired some of these applications in the future.

However it all feels a bit short-sighted to me as they’re essentially withdrawing from the Web. They’re no longer going to be able to benefit from any of the network effects of having their API be a part of the wider web and remixable (within their Terms of Service) with other services and datasets. Innovation will be limited to just those companies they’re choosing to work with through an “experience” driven API. That feels like a bottleneck in the making.

It’s always possible to optimise a business and an API to support a limited set of interactions, but that type of close coupling inevitably results in less flexibility. Personally I’d be backing the Web.

The multiverse in which we play

If the Many Worlds hypothesis is true then we are living in a multiverse of parallel realities and alternate histories. Everything that could have happened did happen. At least somewhere. There are different views of how these parallel universes might differ from one another, forming complete taxonomies of universe types.

It’s interesting to consider what kind of experiments could be conducted in order to prove that these realities exist. But it overlooks the fact that we interact with parallel realities all the time. Worlds that obey different physical and logical laws. Worlds that have their own unique landscapes. And worlds that share a geography but act out alternate histories. Worlds that many of us visit on at least a daily basis through readily available portals.

At the time I’m writing this blog post there are currently over 500,000 people playing DOTA 2. That’s more than the population of Manchester. The number of people currently playing the survival games Rust and DayZ is roughly the population of Bath. There are live stats available from Steam. The peak concurrent users for Steam today was 7.2m people. The fact that games are popular is not news to anyone, but that’s a lot of people visiting a variety of virtual realities. And its fun to think about them as more concrete spaces and consider the different ways in which we access them.

What follows is some follow-my-nose research on different types of game worlds, largely biased towards games that I’ve played or are familiar with in some way.

The Lifetime of Pocket Universes

What should we consider to be a separate game universe?

A game server, which might be multi-player or single player, is a portal for accessing at least one virtual environment. Some game servers host a single persistent game world — or pocket universe — that will stick around for the lifetime of the server (barring system admin interventions). Other game servers will provide access to many, short-lived game worlds. Some may persist for only a few minutes, others for longer.

For example most first person shooters cycle through game worlds that last for around 10-15 minutes. But some offer a more consistent environment: all DayZ Standalone servers host the exact same map (Chernarus) but the game clock and state varies between servers. Its possible to jump between servers and appear in the exact same location but at different times.

This is something that has been limited in recent updates to DayZ because players were travelling within the Chernarus multiverse to get unfair advantages on other players. E.g. looting the same location across different servers, or getting the upper hand in a fight by flanking someone by jumping between servers. It’s been restricted by placing increasingly longer wait times for people hopping between servers. Although I found it to be an interesting game mechanic and I’d love to play a game in which it was a central motif.

While at any one time there may be multiple servers hosting a copy of the same game world, there are often differences. Time zones are one, but enemies and loot may also spawn randomly, meaning that while the physical layout of the worlds are the same, their histories are different. Player actions are obviously significant and some game worlds offer more opportunities for permanently affecting the environment, e.g. by building or destroying objects.

At the extreme end of the scale are game worlds that are based entirely on procedural generation: no two pocket universes will be exactly the same, but they will obey the same physical laws.

Number of Pocket Universes

It’s hard to get decent stats on the number of game servers and their distribution, some of the details are likely to be commercially sensitive. This is one area where I’d like to see more open data. Its not world changing, but its interesting to a lot of people.

The best resource I could find, apart from the high level Steam Statistics, was Game Tracker. This is a service that monitors game servers running across the net. Registered users can add servers to share them with friends and team mates. Currently there are over 130,000 different game servers being tracked by their system, spanning 91 different games. This will be a gross under-estimate for the size of the gaming multiverse, but is a useful data point.

There’s some interesting analysis that could be done on the distribution of those servers, across both games and countries, but unfortunately the terms of use for GameTracker do not allow harvesting of their data. Being able to locate game servers in the real world tells us where those universes intersect with ours.

Of course there are also some games in which there is only a single universe, although its state and geography is split across multiple game servers. Most MMORPGs operate on this basis, with Eve Online probably being one of the more interesting. If only because it has a time dilation mechanic that kicks in when lots of players are co-located on a single server: the passage of time slows down in a local area to allow all necessary computation to take place. The Eve game world actually spans more than one game. The Dust 514 FPS exists in the same universe and there are ways to interact between the games.

MMORPGs also use “instancing” to spawn off smaller (fractal?!) pocket universes to allow groups of players to simultaneously access the same content in a sand boxed environment. This blending of public multi-player and private single-player spaces within game worlds is part of what is known as “mingleplayer” (which is an awful term!)

Demon Souls and Dark Souls (both 1 & 2) offer another interesting variation and, in my opinion, one of the earliest implementations of mingleplayer. In Dark Souls every player exists in their own copy of the game world, but those worlds are loosely connected to those of all other players. In Dark Souls 1 (and, I think Demon’s Souls) this was via a peer-to-peer network, but in Dark Souls 2 it’s a classic client-server set-up. In all of these games it’s possible to invade or be invited into other worlds to help or hinder them. There are also a number of mechanics to allow players to communicate in a limited and in some cases automatic way between worlds. Typical of the series, there are also some unique and opaque systems that allow items, creatures and player actions to spread between worlds.

Game World Sizes

So how big are these pocket universes? How to they compare to one another and with our own universe? There’s a few interesting facts and comparisons which I’ve dug up:

A number of people have also collected together game maps that show the relative sizes of different game environments:

In Game Statistics

Game publishers collect statistics on how players move through their worlds. Sometimes this is just done during testing and level design in order to balance a map, in others the data is made available to players in real-time to help them improve their game, etc. There was an interesting article on statistics collection in the Halo games in Wired a few years ago. These kinds of statistics collection tools are a fundamental part of many design tools these days.

There’s been some interesting visualisation work around these statistics too. I wonder whether any of this could be applied to real-world data? For example balance and flow maps provide different perspectives on events. And here is a visualisation of every player death in Just Cause 2.

What is an Open API?

I was reading a document this week that referred to an “Open API”. It occurred to me that I hadn’t really thought about what that term was supposed to mean before. Having looked at the API in question, it turned out it did not mean what I thought it meant. The definition of Open API on Wikipedia and the associated list of Open APIs are also both a bit lacklustre.

We could probably do with being more precise about what we mean by that term, particularly in how it relates to Open Source and Open Data. So far I’ve seen it used in several different ways:

  1. An API that is free for anyone to use — I think it would be clearer to refer to these as “Public APIs”. Some may require authentication, some may only have a limited free tier of usage, but the API is accessible to anyone that wants to use it
  2. An API that is backed by open data — the data that is extracted by the API is covered by an open licence. A Public API isn’t necessarily backed by Open Data. While it might be free for me to use an API, I may be limited in how I can use the data by API terms and/or a non-open data licence that applies to the data
  3. An API that is based on an open standard — the data available via an API might not be open, but the means of accessing and querying the data is covered by a specification that has been created by a standards body or has otherwise be openly published, e.g. the specification of the API is covered by an open licence. The important thing here is that the API could be (re-)implemented in an open source or commercial product without infringing on anyone’s rights or intellectual property. The specification of APIs that serve open data aren’t necessarily open. A commercial vendor may provide a data publishing service whose API is entirely proprietary.

Personally I think an Open API is one that meets that final definition.

These are important distinctions and I’d encourage you to look at the APIs you’re using or the API’s you’re publishing and considering into which category they fall. APIs built on open source software typically fall into the third category: a reference implementation and API documentation are already in the open. It’s easy to create alternate versions, improve an existing code base, or run a copy of a service.

While the data in a platform may be open, lock-in (whether planned or otherwise) can happen when APIs are proprietary. This limits competition and the ability for both data publishers and consumers to choose other vendors. This is also one reason why APIs shouldn’t be the default for open government data: at some level the raw data should be portable and useful outside of whatever platform the organisation may choose to deploy. Ideally platforms aimed at supporting open government data publishing should be open source or should, at the very least, openly licence their API documentation.


Get every new post delivered to your Inbox.

Join 30 other followers