An Introduction to the British National Bibliography

This is the first of a series of posts (1, 2, 3, 4) providing background and tutorial material about the British National Bibliography. The tutorials were written as part of some freelance work I did for the British Library at the end of 2012. The material was used as input to creating the new documentation for their Linked Data platform but hasn’t been otherwise published. They are now published here with permission of the BL.

This tutorial provides an introduction to the British National Bibliography (BNB) for developers interested in understanding how it could be used in their applications. The tutorial provides:

  • A short introduction to the BNB, including its scope and contents
  • A look at how the BNB can be accessed online
  • An introduction to the data model that underpins the BNB Linked Data

What is the British National Bibliography?

The British National Bibliography has been in development for over 60 years with the first record added in 1950. It contains data about virtually every book and journal title published or distributed in the UK and Ireland since that date. In its role as the national library of the United Kingdom the British Library is responsible for preserving a wide variety of works, and the majority of these are catalogued in the BNB. The exclusions largely relate to some official government publications that are catalogued elsewhere, or other locally published or ephemeral material. With an increasing number of works being published electronically, in 2003 the BNB was extended to include records for UK online electronic resources. The 2013 regulations extended the scope further to include non-print materials.

As well as being an historical archive the BNB also includes data about forthcoming publications, in some cases up to 16 weeks in advance of their actual publication dates. There are over 50,000 new titles published each year in the UK and Ireland, which gives an indication of how quickly the database is growing.

Traditionally the BNB has had a role in helping publishers share metadata with libraries, to reduce the costs of cataloguing works and to inform purchasing decisions. But with the publication of the BNB under an open license, it is now available for anyone to use in a variety of ways. For example the database could be used as:

  • A reliable source of book metadata, e.g. to drive reading list or personal cataloguing applications
  • Insight into the publication output of the UK over different years and genres
  • A means of accessing bibliographies of individual authors, ranging over 60 years

Accessing the BNB Open Data

There are several ways in which the BNB can be accessed. While the online search interface provides a handy way to explore the contents of the bibliography to understand its scope, for application development a machine-readable interface is required.

Originally the primary way to access the BNB was via Z39.50: a specialised protocol for searching remote databases that is used in many library systems. However the BNB is now available through via several other routes that make is easier to use in other contexts. These include:

  • Bulk downloads of the database in RDF/XML format. This includes the full dataset and is intended to support local indexing and analysis
  • Online access as Linked Data, allowing individual records to be accessed in a variety of formats including XML and JSON. This is a subset that includes books and serials only
  • An API that allows the dataset to be queried using the SPARQL query language. This provides a number of ways of querying and extracting portions of the dataset

These different access methods support a range of use cases, e.g. allowing the BNB to be accessed in its entirety to support bulk processing and analysis, whilst also supporting online access from mobile or web applications.

Most importantly the BNB is available for use under an open license. The British Library have chosen to publish the data under a Creative Commons CC0 License which places the entire database into the public domain. Unlike some online book databases or APIs this means there are no restrictions whatsoever on how the BNB can be used.

The Linked Data version of the BNB is the most structured version of the BNB and is likely to be the best starting point for most applications. The same data and data model is used to power the SPARQL endpoint, so understanding its structure will help developers use the API. The Linked Data has also been cross-referenced with other data sources, offering additional sources of information. The rest of this tutorial therefore looks at this data model in more detail.

The BNB Linked Data

Linked Data is a technique for publishing data on the web. Every resource, e.g. every book, author, organisation or subject category, is assigned a URI which becomes its unique identifier. Accessing that URI in a web browser will result in the delivery of a web page that contains a useful summary of the specific resource including, for example, key attributes such as its name or title, and also relationships to other resources e.g. a link from a book to its author.

But the same URI can also be accessed by an application. But, instead of an HTML web page, the data can be retrieved in a variety of different machine-readable formats, including XML and JSON. With Linked Data a web site is also an API. Information can be accessed by both humans and application code in whatever format is most useful.

A database published as Linked Data can be thought of as containing:

  • Resources, which are identified by a URI
  • Attributes of those resources, e.g. names, titles, labels, dates, etc
  • Relationships between resources, e.g. author, publisher, etc.

One special relationship identifies the type of a resource. Depending on the data model, a resource might have multiple types.

Standard Schemas

The data model, or schema, used to publish a dataset as Linked Data consists of a set of terms. Terms are either properties, e.g. attributes or relationships between resources, or types of things, e.g. Book, Person, etc.

Unique URIs are not only used identify resources, they’re also used to identify terms in a schema. For example the unique identifier for the “title” attribute in the BNB is http://purl.org/dc/terms/title, whereas the unique identifier for the Person type is http://xmlns.com/foaf/0.1/Person.

By identifying terms with unique URIs it becomes possible to publish and share their definitions on the web. This supports re-use of schemas, allowing datasets from different organisations to be published using the same terms. This encourages convergence on standard ways to describe particular types of data, making it easier for consumers to use and integrate data from a number of different sources.

The BNB dataset makes use of a number of standard schemas. These are summarised in the following table along with their base URI and links to their individual documentation.

Name URI Role
Bibliographic Ontology http://purl.org/ontology/bibo/ A rich vocabulary for describing many types of publication and their related metadata
Bio http://purl.org/vocab/bio/0.1/ Contains terms for publishing biographical information
British Library Terms http://www.bl.uk/schemas/bibliographic/blterms A new schema published by the British Library which contains some terms not covered in the other vocabularies
Dublin Core http://purl.org/dc/terms/ Basic bibliographic metadata terms like title and creator
Event Ontology http://purl.org/NET/c4dm/event.owl# Properties for describing events and their participants
FOAF http://xmlns.com/foaf/0.1/ Contains terms for describing people, their names and relationships
ISBD http://iflastandards.info/ns/isbd/elements/ Terms from the International Standard Bibliographic Description standard
Org http://www.w3.org/ns/org# Contains terms for describing organisations
Web Ontology Language http://www.w3.org/2002/07/owl# A standard ontology for describing terms and equivalencies between resources
RDF Schema http://www.w3.org/2000/01/rdf-schema# The core RDF schema language which is used to publish new terms
Resource Description and Access http://rdvocab.info/ElementsGr2# Defines some standard library cataloguing terms
SKOS http://www.w3.org/2004/02/skos/core# Supports publication of subject classifications and taxonomies
WGS84 Geo Positioning http://www.w3.org/2003/01/geo/wgs84_pos# Geographic points, latitude and longitude

Returning to our earlier example we can now see that the title attribute in the BNB dataset (http://purl.org/dc/terms/title) is taken from the Dublin Core Schema (http://purl.org/dc/terms/) as they share a common base URI. Similarly the Person type is taken from the FOAF Schema.

It is common practice for datasets published as Linked Data to use terms from multiple schemas. But, while a dataset might mix together several schemas, it is very unlikely that it will use all of the terms from all of the schemas. More commonly only a few terms are taken from each schema.

So, while it is useful to know which schemas have been used in compiling a dataset, it is also important to understand how those schemas have been used to describe specific types of resource. This is covered in more detail in the next section.

The BNB Data Model

There are high level overview diagrams that show the main types of resources and relationships in the BNB dataset. One diagram summarises the data model for books while another summarises the model for serials (e.g periodicals and newspapers).

The following sections add some useful context to those diagrams, highlighting how the most important types of resources in the dataset are described. The descriptions include a list of the attributes and relationships that are commonly associated with each individual type.

The tables are not meant to be exhaustive documentation, but instead highlight the most common or more important properties. The goal is to help users understand the structure of the dataset and the relationships between resources. Further exploration is encouraged. With this in mind, links to example resources are included throughout.

It is also important to underline that not all resources will have all of the listed properties. The quality or availability of data might vary across different publications. Similarly a resource might have multiple instances of a given attribute or relationship. E.g. a book with multiple authors.

Finally, all resources will have an RDF type property (http://www.w3.org/1999/02/22-rdf-syntax-ns#type) and the values of this property are given in each section. As noted above, a resource may have multiple types.

Books

Unsurprisingly, books are one of the key types of resource in the BNB dataset. The important bibliographic metadata for a book is catalogued, including its title, language, number of pages, unique identifiers such as its ISBN and the British Bibliography Number, and references to its author, publisher, the event of its publication, and subject classifications.

Books are identified with an RDF type of http://purl.org/ontology/bibo/Book. The following table summarises the properties most commonly associated with those resources:

Property URI Notes
Abstract http://purl.org/dc/terms/abstract A short abstract or summary of the book
BNB http://www.bl.uk/schemas/bibliographic/blterms#bnb The British Bibliographic Number for the book
Creator http://purl.org/dc/terms/creator Reference to the author(s) of the book; a Person resource
Contributor http://purl.org/dc/terms/contributor Reference to other people involved in the creation of the work, e.g. an editor or illustrator.
Extent http://iflastandards.info/ns/isbd/elements/P1053 The “extent” of the book, i.e. the number of pages it contains
ISBN 10/13 http://purl.org/ontology/bibo/isbn10; http://purl.org/ontology/bibo/isbn13 10-digit and 13-digit [ISBN](http://en.wikipedia.org/wiki/International_Standard_Book_Number) number of the book
Language http://purl.org/dc/terms/language The language of the text
Publication Event http://www.bl.uk/schemas/bibliographic/blterms#publication Reference to an Event resource describing the year and location in which the book was
published
Subject http://purl.org/dc/terms/subject Reference to concept resources that describe the subject category of the book
Table of Contents http://purl.org/dc/terms/tableOfContents Text from the table of contents page
Title http://purl.org/dc/terms/title The title of the book

In some cases there may be multiple instances of these properties. For example a book might have several creators or be associated with multiple subject categories.

The Hobbit makes a good example. There was an edition of the book published in 1993 by Harper Collins. The edition has an ISBN of 0261102214. If you visit the Linked Page page for the resource you can view a description of the book. To get access to the raw data, choose one of the alternate formats, e.g. JSON or XML.

People

The BNB database includes some basic biographical and bibliographic information about people, e.g. authors, illustrators, etc. These resources all have an RDF type of http://xmlns.com/foaf/0.1/Person. The description of a person will typically include both family and given names and, if available, reference to birth and death events.

A person will also be associated with one or more books or other works in the database. A person is either the creator or a contributor to a work. The creator relationship is used to identify a significant contribution, e.g. the author of a book. The contributor relationship covers other forms of contribution, e.g. editor, illustrator, etc.

The following table identifies the individual properties used to describe people:

Property URI Notes
Created http://www.bl.uk/schemas/bibliographic/blterms#hasCreated Reference to a work which the person created
Contributed To http://www.bl.uk/schemas/bibliographic/blterms#hasContributedTo Reference to a work to which the Person has contributed
Event http://purl.org/vocab/bio/0.1/event Reference to an Event resource involving the author. Usually a birth event and/or a death event
Family Name http://xmlns.com/foaf/0.1/familyName The surname of the author
Given Name http://xmlns.com/foaf/0.1/givenName The first name of the author
Name http://xmlns.com/foaf/0.1/name The full name of the author

C. S. Lewis was a prolific author. Visiting the description of Lewis in the BNB database provides a bibliography for Lewis, listing the various works that he authored or those to which he contributed. There are also reference to his birth and death events.

Pauline Baynes was an illustrator who worked with a number of authors, including both Lewis and Tolkein. Baynes’s description in the BNB includes a list of all her contributions (many more than are mentioned on her Wikipedia page). Baynes provides one of many connections between Lewis and Tolkein in the BNB database, via the contributor relationships with their works.

Events, Places and Organizations

An event is something that happens at a particular point in time, involves one or more participants and usually occurs in a specific location. Book publications, births and deaths are all modelled as events in the BNB data. Each type of event has a different RDF type:

  • Publication events have an RDF type of http://www.bl.uk/schemas/bibliographic/blterms#PublicationEvent.
  • Publication start events have an RDF type of http://www.bl.uk/schemas/bibliographic/blterms#PublicationStartEvent. These are associated with both serials and books published over time. The latter also have end events.
  • Birth events have a type of http://purl.org/vocab/bio/0.1/Birth
  • Death events have a type of http://purl.org/vocab/bio/0.1/Death

The publication of an edition of the Hobbit is an example of a Publication Event, whilst Tolkein’s birth and death illustrate the basic biographical detail associated with those events.

The following table summarises the key attributes of an event:

Property URI Notes
Agent http://purl.org/NET/c4dm/event.owl#agent Used to refer to an Organization involved in a Publication Event
Date http://purl.org/vocab/bio/0.1/date The year in which a birth or death took place. In this property the year is captured as a plain text value.
Place http://purl.org/NET/c4dm/event.owl#place Reference to a Place resource which describes the location in which a Publication Event took place. This will either be a resource in the BNB or the Geonames dataset (see “Links to Other Datasets”)
Time http://purl.org/NET/c4dm/event.owl#time The year in which a Publication Event took place. The value is a reference to a resource that describes the year. The URIs are taken from an official UK government dataset.

As noted above, Publication Events often refer to two other types of resource in the dataset:

  • Publishers are resources with an RDF type of http://xmlns.com/foaf/0.1/Agent. The BNB only contains the names of publishers, as an RDF Schema label property. Examples include Harper Collins, Unwin and Longman

  • Places are resources with an RDF type of http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing. Again, only a label property is available. E.g. London or Glasgow.

While these resources provide minimal extra information in the BNB dataset they have been created to allow other organisations to link their data to the BNB

Concepts

The different types of bibliographic resource in the BNB can all be associated with subject categories that help to further describe them, e.g. to indicate their subject matter, theme or genre. The Dubline Core subject property (http://purl.org/dc/terms/subject) is used to associate a resources with one or more categories.

Individual subject categories are organised into a “scheme”. A scheme is a set of formally defined categories that has been published by a particular authority. The BNB data uses schemes from the Library of Congress and the Dewey Decimal Classification. These schemes provide a standard way to catalogue works that are in use in many different systems.

The BNB data uses several different RDF types to identify different types of category, e.g. to differentiate between categories used as topics, place labels, etc. The BNB data model diagram illustrates some of the variety of subject resources that can be found in the dataset.

However while the categories may come from different sources or be of different types, the data about each is essentially the same:

Property URI Notes
Label http://www.w3.org/2000/01/rdf-schema#label A label for the category
Notation http://www.w3.org/2004/02/skos/core#notation A formal identifier for the category, e.g. the Dewey Decimal Number
Scheme http://www.w3.org/2004/02/skos/core#inScheme A reference to the scheme that the concept is associated with. This may be a resource defined in another dataset

Examples of different types of category include:

  • Fiction in English and Childrens Stories. These are both taken from the Library of Congress Subject Headings (LCSH).
  • Dewey Decimal Classifications, e.g. 823.91 which is “English fiction–20th century” in the Dewey system.

Series

Some books in the BNB are not just organised into categories: they are also organised into collections or “series”. Series are collections of books that are usually based around a specific theme or topic. The BNB contains over 190,000 different series covering a wide range of different topics. Series are identified with an RDF type of http://purl.org/ontology/bibo/Series.

Examples include “Men and Machines“, “Science series” and “Pleasure In Reading“. Series provide ready made reading lists on a particular topic that could be used to drive recommendations of books to readers.

Series are essentially just lists of resources and so are described with just a few properties:

Property URI Notes
Label http://www.w3.org/2000/01/rdf-schema#label The name of the series
Part http://purl.org/dc/terms/hasPart A reference to the bibliographic resource that is included in the collection. A series will have many instances of this property, one for each work in the series

Periodical and Newspapers

The coverage of the BNB goes beyond just books. It also has data about a number of serial publications:

  • Newspapers, which have an RDF type of http://purl.org/ontology/bibo/Newspaper
  • Periodicals (e.g. monthly, quarterly or annual publications), which have an RDF type of http://purl.org/ontology/bibo/Periodical

Descriptions of these resources share many similarities with books, e..g title, BNB numbers, subject classifications, etc. However there are several additional properties that are used to specific to serial publications. This includes notes on where and when they were first published, regional focus, etc.

In total there are approximately 10,000 different periodicals in the BNB data. The periodicals may be related to one another, e.g. for example one publication can replace another.

The following table lists some of the key properties of periodicals and newspapers. The serial data model diagram further illustrates some of the key relationships.

Property URI Notes
Alternate Title http://purl.org/dc/terms/alternative Alternative title(s) for the publication
BNB http://www.bl.uk/schemas/bibliographic/blterms#bnb The British Bibliographic Number for the serial
Contributor http://purl.org/dc/terms/contributor A relationship to a Person or Organization that contributed to the publication
Frequency http://iflastandards.info/ns/isbd/elements/P1065 A note on the publication frequency, e.g. “Weekly”
ISSN http://purl.org/ontology/bibo/issn The official [ISSN](http://en.wikipedia.org/wiki/International_Standard_Serial_Number) number for the serial
Language http://purl.org/dc/terms/language The language of the text
Publication Event http://www.bl.uk/schemas/bibliographic/blterms#publicationStart Reference to an Event resource describing the year and location in which the serial was first published
Replaced By http://purl.org/dc/terms/isReplacedBy Reference to a periodical that replaces or supercedes this one
Replaces http://purl.org/dc/terms/replaces Reference to another periodical that this resource replaces
Resource Specific Note http://iflastandards.info/ns/isbd/elements/P1038 Typically contains notes on the start (and end) publication dates of the periodical
Spatial http://purl.org/dc/terms/spatial Reference to Place resources that indicate the geographical focus of a periodical
Subject http://purl.org/dc/terms/subject Reference to Concept resources that describe the subject category of the serial
Title http://purl.org/dc/terms/title The title of the serial

Examples of periodicals include the Whitley Bay News Guardian, the Bath Chronicle and iSight.

The publication Coaching News was replaced by Cycle Coaching, providing an example of direct relationships between publications. As noted in the serial data model diagram other relationships from the Dublin Core vocabulary are used to capture alternate formats and versions of publications.

Links to Other Datasets

The final aspect of the BNB dataset to highlight is its relationships with other datasets. When publishing Linked Data it is best practice to include links or references to other datasets. There are two main forms that this cross-referencing can take:

  • Declaring equivalence links to indicate that a resource in the current data is the same as another resource (with a different identifier) in a different dataset. Publishing this equivalencies help to integrate datasets across the web. It is achieved by using the OWL “Same As” property (http://www.w3.org/2002/07/owl#sameAs) to relate together the two resources.
  • In other cases resources are simply directly referenced as the value of a relationship. For example references to geographical places or subject categories may be made directly to resources in third-party datasets. This avoids creating new descriptions of the same resource.

In both cases the links between the datasets can be followed, by a user or an application, in order to discover additional useful data. For example BNB includes references to places in the GeoNames dataset. Following those links can help an application discover the latitude and longitude of the location.

The BNB uses both of these forms of cross-referencing to create links to a number of different datasets:

All of these datasets contain additional useful contextual data that might be useful for application developers to explore.

Summary

This tutorial has aimed to provide an introduction to the scope, contents and structure of the British National Bibliography.

Starting with some introductory material that briefly summarised the history of the dataset, the various means of accessing the data were then summarised. The dataset contains over 60 years worth of data which has been placed into the public domain, making it freely available for developers to use as they see fit. The data can be downloaded for bulk processing or used online as Linked Data or via a SPARQL endpoint.

The main focus of the tutorial has been on providing an overview of the key types of resource in the dataset, complete with a summary of their key attributes and relationships. Example resources that highlight important features have been given throughout, to help provide useful pointers for further exploration.

The British National Bibliography is an important national resource that has the potential to be used as a key piece of infrastructure in a range of different types of application.