Accessing the British National Bibliography Using SPARQL

This is the third in a series of posts (1, 2, 3, 4) providing background and tutorial material about the British National Bibliography. The tutorials were written as part of some freelance work I did for the British Library at the end of 2012. The material was used as input to creating the new documentation for their Linked Data platform but hasn’t been otherwise published. They are now published here with permission of the BL.

Note: while I’ve attempted to fix up these instructions to account with changes to the platform on which the data is published, there may still be some errors. If there are then please leave a comment or drop me an email and I’ll endeavour to fix.

The British National Bibliography (BNB) is a bibliographic database that contains data on a wide range of books and serial publications published in the UK and Ireland since the 1950s. The database is available under a public domain license and can be accessed via an online API.

The tutorial introduces developers to the BNB API which supports querying of the dataset via the SPARQL query language and protocol. The tutorial provides:

  • Pointers to relevant background material and tutorials on SPARQL and the SPARQL Protocol
  • A collection of useful queries and query patterns for working with the BNB dataset

The queries described in this tutorial have been published as a collection of files that can be download from github.

What is SPARQL?

SPARQL is a W3C standard which defines a query language for RDF databases. Roughly speaking SPARQL is the equivalent of SQL for graph databases. SPARQL 1.0 was first published as an official W3C Recommendation in 2008. At the time of writing SPARQL 1.1, which provides a number of new language features, will shortly be published as a final recommendation.

A SPARQL endpoint implements the SPARQL protocol allowing queries to be submitted over the web. Public SPARQL endpoints offer an API that allows application developers to query and extract data from web or mobile applications.

A complete SPARQL tutorial is outside the scope of this document, but there are a number of excellent resources available for developers wishing to learn more about the query language. Some recommended tutorials and reference guides include:

The BNB SPARQL Endpoint

The BNB public SPARQL endpoint is available from:

http://bnb.data.bl.uk/sparql

No authentication or API keys are required to use this API.

The BNB endpoint supports SPARQL 1.0 only. Queries can be submitted to the endpoint using either GET or POST requests. For POST requests the query is submitted as the body of the request, while for GET requests the query is URL encoded and provided in the query parameter, e.g:

http://bnb.data.bl.uk/sparql?query=SELECT+%3Fs+%3Fp+%3Fo+WHERE+%7B%3Fs+%3Fp+%3Fo%7D+LIMIT+1

Refer to the SPARQL protocol specification for additional background on submitting queries. Client libraries for interacting with SPARQL endpoints are available in a variety of languages, including python, ruby, nodejs, PHP and Java.

Types of SPARQL Query and Result Formats

There are four different types of SPARQL query. Each of the different types supports a different use case:

  • ASK: returns a true or false response to test whether data is present in a dataset, e.g. to perform assertions or check for interesting data before submitting queries. Note these no longer seem to be supported by the BL SPARQL endpoint. All ASK queries now return an error.
  • SELECT: like the SQL SELECT statement this type of query returns a simple tabular result set. Useful for extracting values for processing in non-RDF systems
  • DESCRIBE: requests that the SPARQL endpoint provides a default description of the queried results in the form of an RDF graph
  • CONSTRUCT: builds a custom RDF graph based on data in the dataset

Query results can typically be serialized into multiple formats. ASK and SELECT queries have standard XML and JSON result formats. The graphs produced by DESCRIBE and CONSTRUCT queries can be serialized into any RDF format including Turtle and RDF/XML. The BNB endpoint also supports RDF/JSON output from these types of query. Alternate formats can be selected using the output URL parameter, e.g. output=json:

http://bnb.data.bl.uk/sparql?query=SELECT+%3Fs+%3Fp+%3Fo+WHERE+%7B%3Fs+%3Fp+%3Fo%7D+LIMIT+1&output=json

General Patterns

The following sections provide a number of useful query patterns that illustrate some basic ways to query the BNB.

Discovering URIs

One very common use case when working with a SPARQL endpoint is the need to discover the URI for a resource. For example, the ISBN number for a book or an ISSN number of a serial is likely to be found in a wide variety of databases. It would be useful to be able to use those identifiers to look up the corresponding resource in the BNB.

Here’s a simple SELECT query that looks up a book based on its ISBN-10:


#Declare a prefix for the bibo schema
PREFIX bibo: <http://purl.org/ontology/bibo/>
SELECT ?uri WHERE {
  #Match any resource that has the specific property and value
  ?uri bibo:isbn10 "0261102214".
}

As can be seen from executing this query there are actually 4 different editions of The
that have been published using this ISBN.

Here is a variation of the same query that identifies the resource with an ISSN of 1356-0069:


PREFIX bibo: <http://purl.org/ontology/bibo/>
SELECT ?uri WHERE {
  ?uri bibo:issn "1356-0069".
}

The basic query pattern is the same in each case. Resources are matched based on the value of a literal property. To find different resources just substitute in a different value or match on a different property. The results can be used in further queries or used to access the BNB Linked Data by performing a GET request on the URI.

In some cases it may just be useful to know whether there is a resource that has a matching identifier in the dataset. An ASK query supports this use case. The following query should return true as there is a resource in the BNB with the given ISSN:


PREFIX bibo: <http://purl.org/ontology/bibo/>
ASK WHERE {
  ?uri bibo:issn "1356-0069".
}

Note ASK queries no longer seem to be supported by the BL SPARQL endpoint. All ASK queries now return an error

Extracting Data Using Identifiers

Rather than just request a URI or list of URIs it would be useful to extract some additional attributes of the resources. This is easily done by extending the query pattern to include more properties.

The following example extracts the URI, title and BNB number for all books with a given ISBN:


#Declare some additional prefixes
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX blterms: <http://www.bl.uk/schemas/bibliographic/blterms#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?uri ?bnb ?title WHERE {
  #Match the books by ISBN
  ?uri bibo:isbn10 "0261102214";
       #bind some variables to their other attributes
       blterms:bnb ?bnb;
       dct:title ?title.
}

This patterns extends the previous examples in several ways. Firstly, some additional prefixes are declared because the properties of interest are from several different schemas. Secondly, the query pattern is extended to match the additional attributes of the resources. The values of those attributes are bound to variables. Finally the SELECT clause is extended to list all the variables that should be returned.

If the URI for is already known then this can be used to directly identify the resource of interest. Its properties can then be matched and extracted. The following query returns the ISBN, title and BNB number for a specific book:


PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX blterms: <http://www.bl.uk/schemas/bibliographic/blterms#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?isbn ?title ?bnb WHERE {
  <http://bnb.data.bl.uk/id/resource/009910399> bibo:isbn10 ?isbn;
       blterms:bnb ?bnb;
       dct:title ?title.         
}

Whereas the former query identified resources indirectly, via the value of an attribute, this query directly references a resource using its URI. The query pattern then matches the properties that are of interest. Matching resources by URI is usually much faster than matching based on a literal property.

Itemising all of the properties of a resource can be tiresome. Using SPARQL it is possible to ask the SPARQL endpoint to generate a useful summary of a resource (called a Bounded Description. The endpoint will typically return all attributes and relationships of the resource. This can be achieved using a simple DESCRIBE query:


DESCRIBE <http://bnb.data.bl.uk/id/resource/009910399>

The query doesn’t need to define any prefixes or match any properties: the endpoint will simply return what it knows about a resource as RDF. If RDF/XML isn’t useful then the same results can be retrieved as JSON.

Reverting back to the previous approach of indirectly identifying resources, its possible to ask the endpoint to generate descriptions of all books with a given ISBN:


PREFIX bibo: <http://purl.org/ontology/bibo/>
DESCRIBE ?uri WHERE {
  ?uri bibo:isbn10 "0261102214".
}

Matching By Relationship

Resources can also be matched based on their relationships, by traversing across the graph of data. For example it’s possible to lookup the author for a given book:


PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?author WHERE {
  #Match the book
  ?uri bibo:isbn10 "0261102214";
       #Match its author
       dct:creator ?author.
}

As there are four books with this ISBN the query results return the URI for Tolkien four times. Adding a DISTINCT will remove any duplicates:


PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT DISTINCT ?author WHERE {
  #Match the book
  ?uri bibo:isbn10 "0261102214";
       #Match its author
       dct:creator ?author.
}

Type Specific Patterns

The following sections provide some additional example queries that illustrate some useful queries for working with some specific types of resource in the BNB dataset. Each query is accompanied by links to the SPARQL endpoint that show the results.

For clarity the PREFIX declarations in each query have been ommited. It should be assumed that each query is preceded with the following prefix declarations:


PREFIX bio: <http://purl.org/vocab/bio/0.1/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX blterms: <http://www.bl.uk/schemas/bibliographic/blterms#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX isbd: <http://iflastandards.info/ns/isbd/elements/>
PREFIX org: <http://www.w3.org/ns/org#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rda: <http://RDVocab.info/ElementsGr2/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

Not all of these are required for all of the queries, but they declare all of the prefixes that are likely to be useful when querying the BNB.

People

There are a number of interesting queries that can be used to interact with author data in the BNB.

List Books By An Author

The following query lists all published books written by C. S. Lewis, with the most recently published books returned first:


SELECT ?book ?isbn ?title ?year WHERE {
  #Match all books with Lewis as an author
  ?book dct:creator <http://bnb.data.bl.uk/id/person/LewisCS%28CliveStaples%291898-1963>;
        bibo:isbn10 ?isbn;
        dct:title ?title;
        #match the publication event
        blterms:publication ?publication.

  #match the time of the publication event
  ?publication event:time ?time.
  #match the label of the year
  ?time rdfs:label ?year          
}
#order by descending year, after casting year as an integer
ORDER BY DESC( xsd:int(?year) )

Identifying Genre of an Author

Books in the BNB are associated with one or more subject categories. By looking up the list of categories associated with an author’s works it may be possible to get a sense of what type of books they have written. Here is a query that returns the list of categories associated with C. S Lewis’s works:


SELECT DISTINCT ?category ?label WHERE {
  #Match all books with Lewis as an author
  ?book dct:creator <http://bnb.data.bl.uk/id/person/LewisCS%28CliveStaples%291898-1963>;
     dct:subject ?category.

  ?category rdfs:label ?label.     
}
ORDER BY ?label

Relationships Between Contributors

The following query extracts a list of all people who have contributed to one or more C. S. Lewis books:


SELECT DISTINCT ?author ?name WHERE {
  ?book dct:creator <http://bnb.data.bl.uk/id/person/LewisCS%28CliveStaples%291898-1963>;
     dct:contributor ?author.

  ?author foaf:name ?name.     

  FILTER (?author != <http://bnb.data.bl.uk/id/person/LewisCS%28CliveStaples%291898-1963>) 
}
ORDER BY ?name

Going one step further its possible to identify people that serve as connections between different authors. For example this query finds people that have contributed to books by both C. S. Lewis and J. R. R. Tolkien:


SELECT DISTINCT ?author ?name WHERE {
  ?book dct:creator <http://bnb.data.bl.uk/id/person/LewisCS%28CliveStaples%291898-1963>;
     dct:contributor ?author.

  ?otherBook dct:creator <http://bnb.data.bl.uk/id/person/TolkienJRR%28JohnRonaldReuel%291892-1973>;
     dct:contributor ?author.

  ?author foaf:name ?name.     
}
ORDER BY ?name

Authors Born in a Year

The basic biographical information in the BNB can also be used in queries. For example many authors have a recorded year of birth some a year of death. These are described as Birth or Death Events in the data. The following query illustrates how to find 50 authors born in 1944:


SELECT ?author ?name WHERE {
   ?event a bio:Birth;
      bio:date "1944"^^<http://www.w3.org/2001/XMLSchema#gYear>.

   ?author bio:event ?event;
      foaf:name ?name.
}
LIMIT 50

The years associated with Birth and Death events have am XML Schema datatype associated with them (xsd:Year). It is important to specific this type in the query, otherwise the query will fail to match any data.

Books

There are a large number of published works in the BNB, extracting useful subsets involves identifying some useful dimensions in the data that can be used to filter the results. In addition to finding books by an author there are some other useful facets that relate to books, including:

  • Year of Publication
  • Location of Publication
  • Publisher

The following sections include queries that extract data along these dimensions. In each case the key step is to match the Publication Event associated with the book.

Books Published in a Year

Publication Events have a “time” relationship that refers to a resource for the year of publication. The following query extracts 50 books published in 2010:


SELECT ?book ?isbn ?title ?year WHERE {
  ?book dct:creator ?author;
        bibo:isbn10 ?isbn;
        dct:title ?title;
        #match the publication event
        blterms:publication ?publication.

  #match the time of the publication event
  ?publication event:time ?time.
  #match the label of the year
  ?time rdfs:label "2010"      
}
LIMIT 50

Books Published in a Location

Finding books based on their place of publication is a variation of the above query. Rather than matching the time relationship, the query instead looks for the location associated with the publication event. This query finds 50 books published in Bath:


SELECT ?book ?isbn ?title ?year WHERE {
  ?book dct:creator ?author;
        bibo:isbn10 ?isbn;
        dct:title ?title;
        blterms:publication ?publication.

  ?publication event:place ?place.
  ?place rdfs:label "Bath"
}
LIMIT 50

Books From a Publisher

In addition to the time and place relationships, Publication Events are also related to a publisher via an “agent” relationship. The following query uses a combination of the time and agent relationships to find 50 books published by Allen & Unwin in 2011:


SELECT ?book ?isbn ?title ?year WHERE {
  ?book dct:creator ?author;
        bibo:isbn10 ?isbn;
        dct:title ?title;
        blterms:publication ?publication.

  ?publication event:agent ?agent;
       event:time ?time.

  ?agent rdfs:label "Allen & Unwin".
  ?time rdfs:label "2011".
}
LIMIT 50

These queries can easily be adapted to extend and combine the query patterns further, e.g. to limit results by a combination of place, time and publisher, or along different dimensions such as subject category.

Series

The BNB includes nearly 20,000 book series. The following queries illustrate some useful ways to interact with that data.

Books in a Series

Finding the books associated with a specific series is relatively straight-forward. The following query is very similar to an earlier query to find books based on an author. However in this case the list of books to be returned is identified by matching those that have an “has part” relationship with a series. The query finds books that are part of the “Pleasure In Reading” series:


SELECT ?book ?isbn ?title ?year WHERE {
  <http://bnb.data.bl.uk/id/series/Pleasureinreading> dct:hasPart ?book.

  ?book dct:creator ?author;
        bibo:isbn10 ?isbn;
        dct:title ?title;
        blterms:publication ?publication.

  ?publication event:agent ?agent;
       event:time ?time.

  ?time rdfs:label ?year.
}

Categories for a Series

The BNB only includes minimal metadata about each series: just a name and a list of books. In order to get a little more insight into the type of book included in a series, the following query finds a list of the subject categories associated with a series:


SELECT DISTINCT ?label WHERE {
  <http://bnb.data.bl.uk/id/series/Pleasureinreading> dct:hasPart ?book.

  ?book dct:subject ?subject.

  ?subject rdfs:label ?label.
}

As with the previous query the “Pleasure in Reading” series is identified by its URI. As books in the series might share a category the query uses the DISTINCT keyword to filter the results.

Series Recommendation

A series could be considered as a reading list containing useful suggestions of books on particular topics. One way to find a reading list might be to find lists based on subject category, using a variation of the previous query.

Another approach would be to find lists that already contain works by a favourite author. For example the following query finds the URI and the label of all series that contain books by J. R. R. Tolkien:


SELECT DISTINCT ?series ?label WHERE {
  ?book dct:creator ?author.
  ?author foaf:name "J. R. R. Tolkien".

  ?series dct:hasPart ?book;
     rdfs:label ?label.

}

Categories

The rich subject categories in the BNB data provide a number of useful ways to slice and dice the data. For example it is often useful to just fetch a list of books based on their category. The following query finds a list of American Detective and Mystery books:


SELECT ?book ?title ?name WHERE {

   ?book dct:title ?title;
         dct:creator ?author;
         dct:subject <http://bnb.data.bl.uk/id/concept/lcsh/DetectiveandmysterystoriesAmericanFiction>.

  ?author foaf:name ?name.
}
ORDER BY ?name ?title

For common or broad categories these lists can become very large so filtering them down further into more manageable chunks may be necessary.

Serials

Many of the periodicals and newspapers published in the UK have a local or regional focus. This geographical relationship is recorded in the BNB via a “spatial” relationship of the serial resource. This relationship supports finding publications that are relevant to a particular location in the United Kingdom.

The following query finds serials that focus on the City of Bath:


SELECT ?title ?issn WHERE {

   ?serial dct:title ?title;
           bibo:issn ?issn;
           dct:spatial ?place.

   ?place rdfs:label "Bath (England)".
}

The exact name of the location is used in the match. While it would be possible to filter the results based on a regular expression, this can be very slow. The following query shows how to extract a list of locations referenced from the Dublin Core spatial relationship. This list could be used to populate a search form or application navigation to enable efficient filtering by place name:


SELECT DISTINCT ?place ?label WHERE {

   ?serial dct:spatial ?place.
   ?place rdfs:label ?label.
}

Summary

This tutorial has provided an introduction to using SPARQL to extract data from the BNB dataset. When working with a SPARQL endpoint it is often useful to have example queries that can be customised to support particular use cases. The tutorial has included multiple examples and these are all available to download.

The tutorial has covered some useful general approaches for matching resources based on identifiers and relationships. Looking up URIs in a dataset is an important step in mapping from systems that contain non-URI identifiers, e.g. ISSNs or ISBNs. Once a URI has been discovered it can be used to directly access the BNB Linked Data or used as a parameter to drive further queries.

A number of example queries have also been included showing how to ask useful and interesting questions of the dataset. These queries relate to the main types of resources in the BNB and illustrate how to slice and dice the dataset along a number of different dimensions.

While the majority of the sample queries are simple SELECT queries, it is possible to create variants that use CONSTRUCT or DESCRIBE queries to extract data in other ways. Several good SPARQL tutorials have been referenced to provide further background reading for developers interested in digging into this further.

4 thoughts on “Accessing the British National Bibliography Using SPARQL

Comments are closed.