The Lego Analogy

I think Lego is a great analogy for understanding the importance of data standards and registers.

Lego have been making plastic toys and bricks since the late 40s. It took them a little while to perfect their designs. But since 1958 they’ve been manufacturing bricks in the same way, to the same basic standard. This means that you can take any brick that’s been manufactured over the last 59 years and they’ll fit together. As a company, they have extremely high standards around how their bricks are manufactured. Only 18 in a million are ever rejected.

A commitment to standards maximises the utility of all of the bricks that the company has ever produced.

Open data standards apply the same principle but to data. By publishing data using common APIs, formats and schemas, we can start to treat data like Lego bricks. Standards help us recombine data in many, many different ways.

There are now many more types and shapes of Lego brick than there used to be. The Lego standard colour palette has also evolved over the years. The types and colours of bricks have changed to reflect the company’s desire to create a wider variety of sets and themes.

If you look across all of the different sets that Lego have produced, you can see that some basic pieces are used very frequently. A number of these pieces are “plates” that help to connect other bricks together. If you ask a Master Lego Builder for a list of their favourite pieces, you’ll discover the same. Elements that help you connect other bricks together in new and interesting ways are the most popular.

Registers are small, simple datasets that play the same role in the data ecosystem. They provide a means for us to connect datasets together. A way to improve the quality and structure of other datasets. They may not be the most excitingly shaped data. Sometimes they’re just simple lists and tables. But they play a very important role in unlocking the value of other data.

So there we have it, the Lego analogy for standards and registers.

Mapping wheelchair accessibility, how google could help

This month Google announced a new campaign to crowd-source information on wheelchair accessibility. It will be asking the Local Guides community of volunteers to begin answering simple questions about the wheelchair accessibility of places that appear on Google Maps. Google already crowd-sources a lot of information from volunteers. For example, it asks them to contribute photos, add reviews and validate the data its displaying to users of its mapping products.

It’s great to see Google responding to requests from wheelchair users for better information on accessibility. But I think they can do better.

There are many projects exploring how to improve accessibility information for people with mobility issues, and how to use data to increase mobility. I’ve recently been leading a project in Bath that is using a service called Wheelmap to crowd-source wheelchair accessibility information for the centre of the city. Over two Saturday afternoons we’ve mapped 86% of the city. Crowd-sourcing is a great way to collect this type of information and Google has the reach to really take this to another level.

The problem is that the resulting data is only available to Google. Displaying the data on Google maps will put it in front of millions of people, but that data could potentially be reused in a variety of other ways.

For example, for the Accessible Bath project we’re now able to explore accessibility information based on the type of location. This may be useful for policy makes to help shape support and investment in local businesses to improve accessibility across the city. Bath is a popular tourist destination so it’s important that we’re accessible to all.

We’re able to do this because Wheelmap stores all of its data in OpenStreetMap. We have access to all of the data our volunteers collect and can use it in combination with the rich metadata already in OpenStreetMap. And we can also start to combine it with other information, e.g. data on the ages of buildings, which may yield more insight.

As we learnt in our meetings with local wheelchair users and stroke survivors, mobility and accessibility issues are tricky to address. Road and pavement surfaces and types of dropped kerbs can impacts you differently depending on your specific needs. Often you need more data and more context from other sources to provide the necessary support. Like Google we’re starting with wheelchair accessibility because that’s the easiest problem to begin to address.

To improve routing, for example you might need data on terrain, or be able to identify the locations and sizes of individual disabled parking spaces. Microsoft’s Cities Unlocked are combining accessibility and location data from OpenStreetmap with Wikipedia entries to help blind users navigate a city. They chose OpenStreetMap as their data source because of its flexibility, existing support for accessibility information and rapid updates. This type of innovation requires greater access to raw data, not just data on a map.

By only collecting and displaying data only on its own maps, Google is not maximising the value of the contributions made by it’s Local Guides community. If the data they collected was published under an open licence, it could be used in many other projects. By improving its maps, Google is addressing a specific set of user needs. By opening up the data it could let more people address more user needs.

If Google felt they were unable to publish the data under an open licence, they could at least make the data available to OpenStreetMap contributors to support their mapping events. This type of limited licensing is already being used by Microsoft, Digital Globe and others to make commercial satellite imagery available to the OpenStreetMap community. While restrictive licensing is not ideal, allowing the data to be used to improve open databases, without the need to worry about IP issues is a useful step forward from keeping the data locked down.

Another form of support that Google could offer is to extend to allow accessibility information to be associated with Places. By incorporating this into Google Maps and then openly publishing or sharing that data, it would encourage more organisations to publish this information about their locations.

But I find it hard to think of good reasons why Google wouldn’t make this data openly available. I think its Local Guides community would agree that they’re contributing in order to help make the world a better place. Ensuring that the data can be used by anyone, for any purpose, is the best way to achieve that goal.

Bath Playbills 1812-1851

This weekend I published scans of over 2000 historical playbills for the Theatre Royal in Bath. Here are some notes on whey they come from and how they might be useful.

The scans are all available on Flickr and have been placed into the public domain under a CC0 waiver. You’re free to use them in any way you see fit. The playbills date from 1812 through to 1851. This is the period just before the fire and rebuilding of the theatre in its current location.

The scans are taken from 5 public domain books available digitally from the British Library. All I’ve done in this instance is take the PDF versions of the books, split out the pages into separate images and then upload them to Flickr, into separate collections.

This is a small step, but will hopefully make the contents more discoverable and accessible. The individual playbills are now part of the web, so can be individually referenced and commented on.

For example there are some great images in the later bills. And I learned that in 1840 you could have seen lions, tigers and leopards.


And this playbill includes detail on the plot and scenes from a play called “Susan Hopley” and an intriguing reference to “Punchinello Vampire!”.


As they are all in the public domain, then images will hopefully be of interest to Wikipedians interested in the history of Bath, the theatre or performers such as Joseph Grimaldi. (I did try adding a reference to a playbill myself, but had this reverted because I was “linking to my own social media site”).

There’s a lot of detail to the bills which it might be useful to extract. E.g. the dates of each bill, the plays being performed and details of the performers and sponsors. If anyone is interested in helping to crowd-source that, then let me know!


We can strengthen data infrastructure by analysing open data

Data is infrastructure for our society and businesses. To create stronger, sustainable data infrastructure that supports a variety of users and uses, we need to build it in a principled way.

Over time, as we gain experience with a variety of infrastructures supporting both shared and open data, we can identify the common elements of good data infrastructure. We can use that to help to write a design manual for data infrastructure.

There a variety of ways to approach that task. We can write case studies on specific projects, and we can map ecosystems to understand how value is created through data. We can also take time to contribute to projects. Experiencing different types of governance, following processes and using tools can provide useful insight.

We can also analyse open data to look for additional insights that might help use improve data infrastructure. I’ve recently been involved in two short projects that have analysed some existing open data.

Exploring open data quality

Working with Experian and colleagues at the ODI, we looked at the quality of some UK government datasets. We used a data quality tool to analyse data from the Land Registry, the NHS and Companies House. We found issues with each of the datasets.

It’s clear that there’s is still plenty of scope to make basic improvements to how data is published, by providing:

  • better guidance on the structure, content and licensing of data
  • basic data models and machine-readable schemas to help standardise approaches to sharing similar data
  • better tooling to help reconcile data against authoritative registers

The UK is also still in need of a national open address register.

Open data quality is a current topic in the open data community. The community might benefit from access to an “open data quality index” that provides more detail into these issues. Open data certificates would be an important part of that index. The tools used to generate that index could also be used on shared datasets. The results could be open, even if the datasets themselves might not be.

Exploring the evolution of data

There are currently plans to further improve the data infrastructure that supports academic research by standardising organisation identifiers. I’ve been doing some R&D work for that project to analyse several different shared and open datasets of organisation identifiers. By collecting and indexing the data, we’ve been able to assess how well they can support improving existing data, through automated reconciliation and by creating better data entry tools for users.

Increasingly, when we are building new data infrastructures, we are building on and linking together existing datasets. So it’s important to have a good understanding of the scope, coverage and governance of the source data we are using. Access to regularly published data gives us an opportunity to explore the dynamics around the management of those sources.

For example, I’ve explored the growth of the GRID organisational identifiers.

This type of analysis can help assess the level of investment required to maintain different types of dataset and registers. The type of governance we decide to put around data will have a big impact on the technology and processes that need to be created to maintain it. A collaborative, user maintained register will operate very differently to one that is managed by a single authority.

One final area in which I hope the community can begin to draw together some insight is around how data is used. At present there are no standards to guide the collection and reporting on metrics for the usage of either shared or open data. Publishing open data about how data is used could be extremely useful not just in understanding data infrastructure, but also in providing transparency about when and how data is being used.


Thank you for the data

Here are three anecdotes that show ways in which I’ve shared data with different types of organisation, and how they’ve shared data with me.

Last year we donated some old children’s toys and books to Julian House. When we dropped them off, I signed a Gift Aid declaration to allow the charity to claim additional benefits from our donation. At the end of the tax year they sent me an email, as I requested, to let me know how much they had raised from the donation. It was nice to know that the toys and books had gone to a good home and that the charity had benefited.

A few months ago, we switched energy provider to get a (much!) better and greener deal on our energy bills. The actual process of switching was easy. But we had to jump through a few hoops to actually get a quote. That mostly involved looking at charts and summaries of our current usage, collecting details on our plan and then using that to get a quote from some alternative suppliers. The government are still thinking about whether midata should apply to the energy sector. I don’t think it should because it’s too limited. An open banking model would be much better for consumers.

We decided to go with Octopus as our new supplier. Three months after the switch they sent me a lovely email, a “personal impact report”. It contained some great insights into our energy usage and the impacts on the environment of our greener energy consumption. For example, it told me that 18% of our electricity came from anaerobic digestion. Our biggest renewable supplier of solar energy was Bottreaux Mill Farm in Devon. It made me even happier to have switched, whilst also wishing I’d done it sooner.

Seven years ago I signed up to 23andMe and let them sequence my DNA. I was curious to know what I might learn and whether my data could be useful in medical studies. There are reasons to be wary of sharing this type of personal information, but it’s an informed decision. I understand what it is the company is doing. And I’ve also taken the time to read their privacy policy, which is clearly laid out.

23andMe email me on a regular basis to let me know when my data has contributed towards some published research. Looking at the site I can see I’ve contributed towards 19 published studies, including this one on autoimmune conditions. Our family is definitely interested in supporting any efforts to address autoimmune conditions. Unfortunately I often can’t look at the research because the papers aren’t open access.

I’ve been thinking about these types of exchanges after reading a short paper by Kadija Ferryman. In her paper she suggests that we should think of data as a gift. In the giving of a gift, there is the act of giving (sharing data), the act of receiving (holding that data) and often some form of reciprocation. These three anecdotes illustrate different types of reciprocation. In each case, an organisation has written me a little thank you note to show me how a data gift has been useful to them.

From a data collection point of view, none of the three organisations has had to more than they would have done anyway. Gift Aid requires some extra book-keeping as part of the policy, Octopus will be keeping detailed records on our energy consumption and their energy purchases. And 23andMe will have a clear view on when and where aggregate data is being shared with researchers.

They’ve just chosen to show that they appreciate my data gifts and, in some cases, have given me a data gift in return. I’m now more likely to donate to Julian House, more likely to stay with Octopus, and have greater trust in continuing to let 23andMe store my DNA profile.

Thinking about data as a gift is another useful analogy that we help us think through the appropriate ways to design data sharing arrangements. I know I’d definitely like to receive more data thank you notes.

“The Rock Thane”, an open data parable

In a time long past, in a land far away, there was a once a troubled kingdom. Despite the efforts of the King to offer justice freely to all, many of his subjects were troubled by unscrupulous merchants and greedy landowners. Time and again, the King heard claims of goods not being delivered, or disputes over land.

While the merchants and landowners were able to produce documents and affidavits to their defence, the King grew increasingly troubled. He felt that his subjects were being wronged, and he grew distrustful of the scribes that thronged the hallways of his courts and marketplaces.

One day, three wizards visited the kingdom. The wizards had travelled from the Far East, where as Masters of the Satoshi School, they had developed many curious spells. The three wizards were brothers. Na was the youngest, and was made to work hardest by his elder brothers, Ka and Mo. Mo, the eldest, was versed in many arts still unknown to his brothers.

Their offer to the King was simple: through the use of their magic they would remove all corruption from his lands. In return they would expect to be well paid for their efforts. Keen to be a just and respected ruler, the King agreed to the wizards’ plan. But while their offer was simple, the plan itself was complex.

The wizards explained that, through an obscure art, they could cause words and images to appear within a certain type of rock, or crystal which could be found commonly throughout the land. Once imbued with words, a crystal could no longer be changed even by a powerful wizard. In a masterful show of power, Ka and Mo embedded the King’s favourite poem and then a painting of his mother in a pair of crystals of the highest quality.

The wizards explained that rather than relying on parchment which could be faked or changed through the cunning application of pumice stones, they could use inscribed crystals to create indelible records of trading bills, property sales and other important documents.

The wizards also demonstrated to the King how, by channelling the power of their masters, groups of their acolytes could simultaneously record the same words in crystals all across the land. This meant that not only would there be an indisputable record of a given trade, but that there would immediately be dozens of copies available across the land, for anyone to check. Readily available and verifiable copies of any bill of trade would mean that no merchant could ever falsify a transaction.

In payment, the wizards would receive a gold piece for every crystal inscribed by their acolytes. Each crystal providing a clear proof of their works.

Impressed, the King decreed that henceforth, all across his lands, trading would now be carried out in trading posts staffed by teams of the wizards’ acolytes.

And, for a time, everything was fine.

But the King began to again receive troubling reports about trading disputes. Trust was failing once again. Speaking to his advisers and visiting some of the new trading posts, the King learned the source of the concerns.

When trading bills had been written on parchment, they could be read by anyone. This made them accessible to all. But only the wizards and their acolytes could read the words inscribed in the crystals. And the King’s subjects didn’t trust them.

Demanding an explanation, the King learnt that Na, the youngest wizard, had been tasked with providing the power necessary to inscribe the crystals. Not as versed in the art as his elder brothers, he was only able to inscribe the crystals with a limited number of words and only the haziest of images. Rather than inscribing easily readable bills of trade, Na and the acolytes were making inscriptions in a cryptic language known only to wizards.

Anyone wanting to read a bill had to request an acolyte to interpret it for them. Rumours had been spreading that the acolytes could be paid to interpret the runes in ways that were advantageous to those with sufficient coin.

The middle brother, Ka, attempted to placate the enraged King, proposing an alternative arrangement. He would oversee the inscribing of the crystals in the place of his brother. Skilled in additional spells, Ka’s proposal was that the crystals would no longer be inscribed with runes describing the bills of sale. Instead each crystal would simply hold the number of a page in a magical book. Each Book of Bills, would hold an infinite number of pages. And, when a sale was made one acolyte would write the bill into a fresh page of a Book, whilst another would inscribe the page number into a crystal. As before, across the land, other acolytes would simultaneously inscribe copies of the bills into other crystals and other copies of the Book.

In this way, anyone wanting to read a bill of sale could simply ask a Book of Bills to turn to the page they needed. Anyone could then read from the book. But the crystals themselves would remain the ultimate proof of the trade. While someone might have been able to fake a copy of a Book, no-one could fake one of the crystals.

Grudgingly accepting this even more complex arrangement, the King was briefly satisfied. Until the accident.

One day, the wizard Ka visited the Craggy Valley, to forage for the rare Ipoh herb, which was known to grow in that part of the Kingdom. However, in a sudden fog, the wizard slipped and fell to his doom. And at the moment of his death, all of the wizard’s spells were undone. In a blink of an eye, all of the magical Books of Bills disappeared. Along with every proof of trade.

Enraged once more, the King gave the eldest wizard one more opportunity to deliver. Mo reassured the King that his power was far greater and that he was uniquely able to deliver on his late brother’s promise. Mo explained that through various dark arts he was able to resist death. He demonstrated his skill to the King, recklessly drinking terrible poisons, and throwing himself from a high tower only to land unharmed. Stunned at this show of power, the King agreed that Mo could take up his brother’s task.

For a few months, the turmoil was resolved, until fresh reports of corruption begin to spread.

A dismayed King granted an audience to a retinue of merchants who had travelled from all across his kingdom. The merchants claimed to have evidence that discrepancies had begun to appear in the Books of Bills. In different towns and cities the Books showed slightly different numbers. There was also talk of a strange, shadowy figure who had been present at many of the trading posts in which discrepancies had been found.

Troubled, the King sent out soldiers to set watch on the trading posts, giving orders that they should attempt to capture and bring this stranger to the court.

Many weeks of waiting and watching passed. More evidence of corrupted Books of Bills continued to appear. Challenged to explain the allegations, Mo scoffed at the evidence. The wizard suggested that the problem was illiterate merchants, asserting that his acolytes were above suspicion.

But finally the king’s soldiers captured the shadowy stranger, and his identity was revealed.

While Mo was the oldest of the three wizards, he was not the eldest. There was a fourth brother, named To. Much older than his brothers, To had been stripped of his riches and banished for studying certain forbidden arts. It was from their brother that Na, Ka and Mo had learned many of their spells, including the arts of inscribing crystals and books, and the means of channelling their powers through acolytes.

Except To had not taught them everything. He had kept many secrets for himself and was able to corrupt the spells used to inscribe the crystals and Books. He was able to change page numbers to refer to other pages which he had inscribed with different words. He had been selling his skills to unscrupulous merchants in an attempt to grow rich once again.

Sickened of wizards and their complicated schemes, the King banished them from his kingdom, never to return.

The King then turned to the task of once more building trust in commerce across his land. He did this not by trusting in magics and complex schemes, but by addressing the problems with which he was originally faced. He decreed the founding of a guild, to create a cadre of trusted, reliable scribes. He appointed new ombudsman and magistrates across the land, to help oversee and administer all forms of trade. He founded libraries and reading rooms to increase literacy amongst his subjects, so that more of them could read and write their own bills of trade. And he offered free use of the courts to all, so that none were denied an opportunity to seek justice.

Many years passed before the King and his kingdom worked through their troubles. But in the history books, the King was forever known as “The Rock Thane”.

Read the previous open data parables: The scribe and the djinn’s agreement, and The woodcutter.

Data is infrastructure, so it needs a design manual

Data is like roads. Roads help us navigate to a destination. Data helps us navigate to a decision. I like that metaphor. It helps to highlight the increasingly important role that data plays in modern society and business.

Roads help us travel to work and school. They also support a variety of different business uses. Roads are infrastructure that are created and maintained by society for the benefit of everyone. Open data, and especially open data published by the public sector, has similar characteristics. Like roads, data is infrastructure.

I think “infrastructure” is a fantastic framing for thinking about how we design and build systems that support the collection, use and reuse of data. It encourages us to think not just about the technology but also about the people who might use that data, or be impacted by it. And so we can define some principles that help define good data infrastructure.

Because I like to leave no metaphor un-stretched, I was excited to learn about the Design Manual for Roads and Bridges. It’s a 15 volume collection that helps to provide standards, advice and other guidance relating to the design, assessment and operation of roads. They’re packed with technical guidance that supports our national infrastructure.

And that’s not all. The government has also helpfully provided the Manual for Streets. The manual explains that “Good design is fundamental to achieving high-quality, attractive places that are socially, economically and environmentally sustainable”. I couldn’t agree more.

The summary explains that the manual breaks down the design of streets into processes that range from policy through to implementation. There’s also a hierarchy that priorities the needs of pedestrians, those that will be most impacted by the infrastructure, over others that might also benefit from it. The manual explains that this helps to ensure that all user needs are met. Just like how we must think about the individual first when building systems that collect and use data.

The Manual for Streets also talks about the importance of standards, of connectivity and assessing quality. It also notes the need to supporting and encourage multiple uses. All of these have obvious parallels in data infrastructure and open licensing. The manual also highlights the importance of thinking about maintenance and sustainability which is another important characteristic of data infrastructure which is often overlooked.

I think it might be interesting to think about what a Design Manual for Data Infrastructure would look like. Perhaps we can use the roads metaphor to help scope that?

For example, the first few volumes in the Design Manual for Roads and Bridges focuses on general design principles, materials and methods of inspection and maintenance. That’s followed by more specific guidance on things like Road Geometry (data modelling and formats), Traffic Signs and Lighting (metadata, documentation, provenance), Traffic Control (data publishing and API design) and Communications (user engagement). There are also separate volumes that cover assessing environmental impact (data ethic, privacy impact assessments, etc).

We’re at an early stage of understanding how to build good data infrastructure. But there are already projects out there that we could learn from. And we can turn that learning into more detailed guidance and patterns that can be reused across sectors.

Sometimes metaphors can be stretched too far, but I think there’s a bit more mileage in the road metaphor yet. (Sorry, not sorry).