Bath Playbills 1812-1851

This weekend I published scans of over 2000 historical playbills for the Theatre Royal in Bath. Here are some notes on whey they come from and how they might be useful.

The scans are all available on Flickr and have been placed into the public domain under a CC0 waiver. You’re free to use them in any way you see fit. The playbills date from 1812 through to 1851. This is the period just before the fire and rebuilding of the theatre in its current location.

The scans are taken from 5 public domain books available digitally from the British Library. All I’ve done in this instance is take the PDF versions of the books, split out the pages into separate images and then upload them to Flickr, into separate collections.

This is a small step, but will hopefully make the contents more discoverable and accessible. The individual playbills are now part of the web, so can be individually referenced and commented on.

For example there are some great images in the later bills. And I learned that in 1840 you could have seen lions, tigers and leopards.

1831-1840_418

And this playbill includes detail on the plot and scenes from a play called “Susan Hopley” and an intriguing reference to “Punchinello Vampire!”.

1841-1851_327

As they are all in the public domain, then images will hopefully be of interest to Wikipedians interested in the history of Bath, the theatre or performers such as Joseph Grimaldi. (I did try adding a reference to a playbill myself, but had this reverted because I was “linking to my own social media site”).

There’s a lot of detail to the bills which it might be useful to extract. E.g. the dates of each bill, the plays being performed and details of the performers and sponsors. If anyone is interested in helping to crowd-source that, then let me know!

 

We can strengthen data infrastructure by analysing open data

Data is infrastructure for our society and businesses. To create stronger, sustainable data infrastructure that supports a variety of users and uses, we need to build it in a principled way.

Over time, as we gain experience with a variety of infrastructures supporting both shared and open data, we can identify the common elements of good data infrastructure. We can use that to help to write a design manual for data infrastructure.

There a variety of ways to approach that task. We can write case studies on specific projects, and we can map ecosystems to understand how value is created through data. We can also take time to contribute to projects. Experiencing different types of governance, following processes and using tools can provide useful insight.

We can also analyse open data to look for additional insights that might help use improve data infrastructure. I’ve recently been involved in two short projects that have analysed some existing open data.

Exploring open data quality

Working with Experian and colleagues at the ODI, we looked at the quality of some UK government datasets. We used a data quality tool to analyse data from the Land Registry, the NHS and Companies House. We found issues with each of the datasets.

It’s clear that there’s is still plenty of scope to make basic improvements to how data is published, by providing:

  • better guidance on the structure, content and licensing of data
  • basic data models and machine-readable schemas to help standardise approaches to sharing similar data
  • better tooling to help reconcile data against authoritative registers

The UK is also still in need of a national open address register.

Open data quality is a current topic in the open data community. The community might benefit from access to an “open data quality index” that provides more detail into these issues. Open data certificates would be an important part of that index. The tools used to generate that index could also be used on shared datasets. The results could be open, even if the datasets themselves might not be.

Exploring the evolution of data

There are currently plans to further improve the data infrastructure that supports academic research by standardising organisation identifiers. I’ve been doing some R&D work for that project to analyse several different shared and open datasets of organisation identifiers. By collecting and indexing the data, we’ve been able to assess how well they can support improving existing data, through automated reconciliation and by creating better data entry tools for users.

Increasingly, when we are building new data infrastructures, we are building on and linking together existing datasets. So it’s important to have a good understanding of the scope, coverage and governance of the source data we are using. Access to regularly published data gives us an opportunity to explore the dynamics around the management of those sources.

For example, I’ve explored the growth of the GRID organisational identifiers.

This type of analysis can help assess the level of investment required to maintain different types of dataset and registers. The type of governance we decide to put around data will have a big impact on the technology and processes that need to be created to maintain it. A collaborative, user maintained register will operate very differently to one that is managed by a single authority.

One final area in which I hope the community can begin to draw together some insight is around how data is used. At present there are no standards to guide the collection and reporting on metrics for the usage of either shared or open data. Publishing open data about how data is used could be extremely useful not just in understanding data infrastructure, but also in providing transparency about when and how data is being used.

 

Thank you for the data

Here are three anecdotes that show ways in which I’ve shared data with different types of organisation, and how they’ve shared data with me.

Last year we donated some old children’s toys and books to Julian House. When we dropped them off, I signed a Gift Aid declaration to allow the charity to claim additional benefits from our donation. At the end of the tax year they sent me an email, as I requested, to let me know how much they had raised from the donation. It was nice to know that the toys and books had gone to a good home and that the charity had benefited.

A few months ago, we switched energy provider to get a (much!) better and greener deal on our energy bills. The actual process of switching was easy. But we had to jump through a few hoops to actually get a quote. That mostly involved looking at charts and summaries of our current usage, collecting details on our plan and then using that to get a quote from some alternative suppliers. The government are still thinking about whether midata should apply to the energy sector. I don’t think it should because it’s too limited. An open banking model would be much better for consumers.

We decided to go with Octopus as our new supplier. Three months after the switch they sent me a lovely email, a “personal impact report”. It contained some great insights into our energy usage and the impacts on the environment of our greener energy consumption. For example, it told me that 18% of our electricity came from anaerobic digestion. Our biggest renewable supplier of solar energy was Bottreaux Mill Farm in Devon. It made me even happier to have switched, whilst also wishing I’d done it sooner.

Seven years ago I signed up to 23andMe and let them sequence my DNA. I was curious to know what I might learn and whether my data could be useful in medical studies. There are reasons to be wary of sharing this type of personal information, but it’s an informed decision. I understand what it is the company is doing. And I’ve also taken the time to read their privacy policy, which is clearly laid out.

23andMe email me on a regular basis to let me know when my data has contributed towards some published research. Looking at the site I can see I’ve contributed towards 19 published studies, including this one on autoimmune conditions. Our family is definitely interested in supporting any efforts to address autoimmune conditions. Unfortunately I often can’t look at the research because the papers aren’t open access.

I’ve been thinking about these types of exchanges after reading a short paper by Kadija Ferryman. In her paper she suggests that we should think of data as a gift. In the giving of a gift, there is the act of giving (sharing data), the act of receiving (holding that data) and often some form of reciprocation. These three anecdotes illustrate different types of reciprocation. In each case, an organisation has written me a little thank you note to show me how a data gift has been useful to them.

From a data collection point of view, none of the three organisations has had to more than they would have done anyway. Gift Aid requires some extra book-keeping as part of the policy, Octopus will be keeping detailed records on our energy consumption and their energy purchases. And 23andMe will have a clear view on when and where aggregate data is being shared with researchers.

They’ve just chosen to show that they appreciate my data gifts and, in some cases, have given me a data gift in return. I’m now more likely to donate to Julian House, more likely to stay with Octopus, and have greater trust in continuing to let 23andMe store my DNA profile.

Thinking about data as a gift is another useful analogy that we help us think through the appropriate ways to design data sharing arrangements. I know I’d definitely like to receive more data thank you notes.

“The Rock Thane”, an open data parable

In a time long past, in a land far away, there was a once a troubled kingdom. Despite the efforts of the King to offer justice freely to all, many of his subjects were troubled by unscrupulous merchants and greedy landowners. Time and again, the King heard claims of goods not being delivered, or disputes over land.

While the merchants and landowners were able to produce documents and affidavits to their defence, the King grew increasingly troubled. He felt that his subjects were being wronged, and he grew distrustful of the scribes that thronged the hallways of his courts and marketplaces.

One day, three wizards visited the kingdom. The wizards had travelled from the Far East, where as Masters of the Satoshi School, they had developed many curious spells. The three wizards were brothers. Na was the youngest, and was made to work hardest by his elder brothers, Ka and Mo. Mo, the eldest, was versed in many arts still unknown to his brothers.

Their offer to the King was simple: through the use of their magic they would remove all corruption from his lands. In return they would expect to be well paid for their efforts. Keen to be a just and respected ruler, the King agreed to the wizards’ plan. But while their offer was simple, the plan itself was complex.

The wizards explained that, through an obscure art, they could cause words and images to appear within a certain type of rock, or crystal which could be found commonly throughout the land. Once imbued with words, a crystal could no longer be changed even by a powerful wizard. In a masterful show of power, Ka and Mo embedded the King’s favourite poem and then a painting of his mother in a pair of crystals of the highest quality.

The wizards explained that rather than relying on parchment which could be faked or changed through the cunning application of pumice stones, they could use inscribed crystals to create indelible records of trading bills, property sales and other important documents.

The wizards also demonstrated to the King how, by channelling the power of their masters, groups of their acolytes could simultaneously record the same words in crystals all across the land. This meant that not only would there be an indisputable record of a given trade, but that there would immediately be dozens of copies available across the land, for anyone to check. Readily available and verifiable copies of any bill of trade would mean that no merchant could ever falsify a transaction.

In payment, the wizards would receive a gold piece for every crystal inscribed by their acolytes. Each crystal providing a clear proof of their works.

Impressed, the King decreed that henceforth, all across his lands, trading would now be carried out in trading posts staffed by teams of the wizards’ acolytes.

And, for a time, everything was fine.

But the King began to again receive troubling reports about trading disputes. Trust was failing once again. Speaking to his advisers and visiting some of the new trading posts, the King learned the source of the concerns.

When trading bills had been written on parchment, they could be read by anyone. This made them accessible to all. But only the wizards and their acolytes could read the words inscribed in the crystals. And the King’s subjects didn’t trust them.

Demanding an explanation, the King learnt that Na, the youngest wizard, had been tasked with providing the power necessary to inscribe the crystals. Not as versed in the art as his elder brothers, he was only able to inscribe the crystals with a limited number of words and only the haziest of images. Rather than inscribing easily readable bills of trade, Na and the acolytes were making inscriptions in a cryptic language known only to wizards.

Anyone wanting to read a bill had to request an acolyte to interpret it for them. Rumours had been spreading that the acolytes could be paid to interpret the runes in ways that were advantageous to those with sufficient coin.

The middle brother, Ka, attempted to placate the enraged King, proposing an alternative arrangement. He would oversee the inscribing of the crystals in the place of his brother. Skilled in additional spells, Ka’s proposal was that the crystals would no longer be inscribed with runes describing the bills of sale. Instead each crystal would simply hold the number of a page in a magical book. Each Book of Bills, would hold an infinite number of pages. And, when a sale was made one acolyte would write the bill into a fresh page of a Book, whilst another would inscribe the page number into a crystal. As before, across the land, other acolytes would simultaneously inscribe copies of the bills into other crystals and other copies of the Book.

In this way, anyone wanting to read a bill of sale could simply ask a Book of Bills to turn to the page they needed. Anyone could then read from the book. But the crystals themselves would remain the ultimate proof of the trade. While someone might have been able to fake a copy of a Book, no-one could fake one of the crystals.

Grudgingly accepting this even more complex arrangement, the King was briefly satisfied. Until the accident.

One day, the wizard Ka visited the Craggy Valley, to forage for the rare Ipoh herb, which was known to grow in that part of the Kingdom. However, in a sudden fog, the wizard slipped and fell to his doom. And at the moment of his death, all of the wizard’s spells were undone. In a blink of an eye, all of the magical Books of Bills disappeared. Along with every proof of trade.

Enraged once more, the King gave the eldest wizard one more opportunity to deliver. Mo reassured the King that his power was far greater and that he was uniquely able to deliver on his late brother’s promise. Mo explained that through various dark arts he was able to resist death. He demonstrated his skill to the King, recklessly drinking terrible poisons, and throwing himself from a high tower only to land unharmed. Stunned at this show of power, the King agreed that Mo could take up his brother’s task.

For a few months, the turmoil was resolved, until fresh reports of corruption begin to spread.

A dismayed King granted an audience to a retinue of merchants who had travelled from all across his kingdom. The merchants claimed to have evidence that discrepancies had begun to appear in the Books of Bills. In different towns and cities the Books showed slightly different numbers. There was also talk of a strange, shadowy figure who had been present at many of the trading posts in which discrepancies had been found.

Troubled, the King sent out soldiers to set watch on the trading posts, giving orders that they should attempt to capture and bring this stranger to the court.

Many weeks of waiting and watching passed. More evidence of corrupted Books of Bills continued to appear. Challenged to explain the allegations, Mo scoffed at the evidence. The wizard suggested that the problem was illiterate merchants, asserting that his acolytes were above suspicion.

But finally the king’s soldiers captured the shadowy stranger, and his identity was revealed.

While Mo was the oldest of the three wizards, he was not the eldest. There was a fourth brother, named To. Much older than his brothers, To had been stripped of his riches and banished for studying certain forbidden arts. It was from their brother that Na, Ka and Mo had learned many of their spells, including the arts of inscribing crystals and books, and the means of channelling their powers through acolytes.

Except To had not taught them everything. He had kept many secrets for himself and was able to corrupt the spells used to inscribe the crystals and Books. He was able to change page numbers to refer to other pages which he had inscribed with different words. He had been selling his skills to unscrupulous merchants in an attempt to grow rich once again.

Sickened of wizards and their complicated schemes, the King banished them from his kingdom, never to return.

The King then turned to the task of once more building trust in commerce across his land. He did this not by trusting in magics and complex schemes, but by addressing the problems with which he was originally faced. He decreed the founding of a guild, to create a cadre of trusted, reliable scribes. He appointed new ombudsman and magistrates across the land, to help oversee and administer all forms of trade. He founded libraries and reading rooms to increase literacy amongst his subjects, so that more of them could read and write their own bills of trade. And he offered free use of the courts to all, so that none were denied an opportunity to seek justice.

Many years passed before the King and his kingdom worked through their troubles. But in the history books, the King was forever known as “The Rock Thane”.


Read the previous open data parables: The scribe and the djinn’s agreement, and The woodcutter.

Data is infrastructure, so it needs a design manual

Data is like roads. Roads help us navigate to a destination. Data helps us navigate to a decision. I like that metaphor. It helps to highlight the increasingly important role that data plays in modern society and business.

Roads help us travel to work and school. They also support a variety of different business uses. Roads are infrastructure that are created and maintained by society for the benefit of everyone. Open data, and especially open data published by the public sector, has similar characteristics. Like roads, data is infrastructure.

I think “infrastructure” is a fantastic framing for thinking about how we design and build systems that support the collection, use and reuse of data. It encourages us to think not just about the technology but also about the people who might use that data, or be impacted by it. And so we can define some principles that help define good data infrastructure.

Because I like to leave no metaphor un-stretched, I was excited to learn about the Design Manual for Roads and Bridges. It’s a 15 volume collection that helps to provide standards, advice and other guidance relating to the design, assessment and operation of roads. They’re packed with technical guidance that supports our national infrastructure.

And that’s not all. The government has also helpfully provided the Manual for Streets. The manual explains that “Good design is fundamental to achieving high-quality, attractive places that are socially, economically and environmentally sustainable”. I couldn’t agree more.

The summary explains that the manual breaks down the design of streets into processes that range from policy through to implementation. There’s also a hierarchy that priorities the needs of pedestrians, those that will be most impacted by the infrastructure, over others that might also benefit from it. The manual explains that this helps to ensure that all user needs are met. Just like how we must think about the individual first when building systems that collect and use data.

The Manual for Streets also talks about the importance of standards, of connectivity and assessing quality. It also notes the need to supporting and encourage multiple uses. All of these have obvious parallels in data infrastructure and open licensing. The manual also highlights the importance of thinking about maintenance and sustainability which is another important characteristic of data infrastructure which is often overlooked.

I think it might be interesting to think about what a Design Manual for Data Infrastructure would look like. Perhaps we can use the roads metaphor to help scope that?

For example, the first few volumes in the Design Manual for Roads and Bridges focuses on general design principles, materials and methods of inspection and maintenance. That’s followed by more specific guidance on things like Road Geometry (data modelling and formats), Traffic Signs and Lighting (metadata, documentation, provenance), Traffic Control (data publishing and API design) and Communications (user engagement). There are also separate volumes that cover assessing environmental impact (data ethic, privacy impact assessments, etc).

We’re at an early stage of understanding how to build good data infrastructure. But there are already projects out there that we could learn from. And we can turn that learning into more detailed guidance and patterns that can be reused across sectors.

Sometimes metaphors can be stretched too far, but I think there’s a bit more mileage in the road metaphor yet. (Sorry, not sorry).

Lunchtime Lecture: “How you (yes, you) can contribute to open data”

The following is a written version of the lunchtime lecture I gave today at the Open Data Institute. I’ll put in a link to the video when it comes online. It’s not a transcript, I’m just writing down what I had planned to say.

Hello!

I’m going to talk today about some of the projects that first got me excited about data on the web and open data specifically. I’m hopefully going to get you excited about them too. And show some ways in which you can individually get involved in creating some open data.

Open data is not (just) open government data

I’ve been reflecting recently about the shape of the open data community and ecosystem, to try and understand common issues and areas for useful work.

For example, we spend a lot of time focusing on Open Government Data. And so we talk about how open data can drive economic growth, create transparency, and be used to help tackle social issues.

But open data isn’t just government data. It’s a broader church that includes many different communities and organisations who are publishing and using open data for different purposes.

Open data is not (just) organisational data

More recently, as a community, we’ve focused some of our activism on encouraging commercial organisations to not just use open data (which many have been doing for years), but also to publish open data.

And so we talk about how open data can be supported by different business models and the need for organisational change to create more open cultures. And we collect evidence of impact to encourage more organisations to also become more open.

But open data isn’t just about data from organisations. Open data can be created and published by individuals and communities for their own needs and purposes.

Open data can (also) be a creative activity

Open data can also be a creative activity. A means for communities to collaborate around sharing what they know about a topic that is important or meaningful to them. Simply because they want to do it. I think sometimes we overlook these projects in the drive to encourage governments and other organisations to publish open data.

So I’m going to talk through eight (you said six in the talk, idiot! – Ed) different example projects. Some you will have definitely heard about before, but I suspect there will be a few that you haven’t. In most cases the primary goals of these projects are to create an openly licensed dataset. So when you contribute to the project, you’re directly helping to create more open data.

Of course, there are other ways in which we each contribute to open data. But these are often indirect contributions. For example where our personal data that is held in various services is aggregated, anonymised and openly published. But today I want to focus today on more direct contributions.

For each of the examples I’ve collected a few figures that indicate the date the project started, the number of contributors, and an indication of the size of the dataset. Hopefully this will help paint a picture of the level of effort that is already going into maintaining these resources. (Psst, see the slides for the figures – Ed)

Wikipedia

The first example is Wikipedia. Everyone knows that anyone can edit Wikipedia. But you might not be aware that Wikipedia can be turned into structured data and used in applications. There’s lots of projects that do it. E.g. dbpedia which brings Wikipedia into the web of data.

The bit that’s turned into structured data are the “infoboxes” that give you the facts and figures about the person (for example) that you’re reading about. So if you add to Wikipedia and specifically add to the infoboxes, then you’re adding to an openly licensed dataset.

The most obvious example of where this data is used is in Google search results. The infoboxes you seen on search results whenever you google for a person, place or thing is partly powered by Wikipedia data.

A few years ago I added a wikipedia page for Gordon Boshell, the author of some children’s books I loved as a kid. There wasn’t a great deal of information about him on the web, so I pulled whatever I could find together and created a page for him. Now when anyone searches for Gordon Boshell they can see some information about him right on Google. And they now link out to the books that he wrote. It’s nice to think that I’ve helped raise his profile.

There’s also a related project from the Wikimedia Foundation called Wikidata. Again, anyone can edit it, but its a database of facts and figures rather than an encyclopedia.

OpenStreetMap

The second example is OpenStreetMap. You’ll definitely have already heard about its goal to create a crowd-sourced map of the world. OpenStreetMap is fascinating because its grown this incredible ecosystem of tools and projects that make it easier to contribute to the database.

I’ve recently been getting involved with contributing to OpenStreetMap. My initial impression was that I was probably going to have to get a commercial GPS and go out and do complicated surveying. But its not like that at all. It’s really easy to add points to the map, and to use their tools to trace buildings from satellite imagery. They provide create tutorials to help you get started.

It’s surprisingly therapeutic. I’ve spent a few evenings drinking a couple of beers and tracing buildings. It’s a bit like an adult colouring book, except you’re creating a better map of the world. Neat!

There are a variety of other tools that let you contribute to OpenStreetMap. For example Wheelmap allows you to add wheelchair accessibility ratings to locations on the map. We’ve been using this in the AccessibleBath project to help crowd-source data about wheelchair accessibility in Bath. One afternoon we got a group of around 25 volunteers together for a couple of hours and mapped 30% of the city centre.

There’s a lot of humanitarian mapping that happens using OpenStreetMap. If there’s been a disaster or a disease outbreak then aid workers often need better maps to help reach the local population and target their efforts. Missing Maps lets you take part in that. They have a really nice workflow that lets you contribute towards improving the map by tracing satellite imagery.

There’s a related project called MapSwipe. Its a mobile application that presents you with a grid of satellite images. All you have to do is click the titles which contain a building and then swipe left. Behind the scenes this data is used to direct Missing Maps volunteers towards the areas where more detailed mapping would be most useful. This focuses contributors attention where its best needed and so is really respectful of people’s time.

MapSwipe can also be used offline. So you can download a work package to do when you’re on your daily commute. Easy!

Zooniverse

You’ve probably also heard of Zooniverse, which is my third example. It’s a platform for citizen science projects. That just means using crowd-sourcing to create scientific datasets.

Their most famous project is probably GalaxyZoo which asked people to help classify objects in astronomical imagery. But there are many other projects. If you’re interested in biology then perhaps you’d like to help catalogue specimens held in the archives of the Natural History Museum?

Or there’s Old Weather, which I might get involved with. In that project you can help to build a picture of our historical climate by transcribing the weather reports that whaling ship captains wrote in their logs. By collecting that information we can build a dataset that tells us more about our climate.

I think its a really innovative way to use historical documents.

MusicBrainz

This is my fourth and favourite example. MusicBrainz is a database of music metadata: information about artists, albums, and tracks. It was created in direct response to commercial music databases that were asking people to contribute to their dataset, but then were taking all of the profits and not returning any value to the community. MusicBrainz created a free, open alternative.

I think MusicBrainz is the first open dataset I first got involved with. I wrote a client library to help developers use the data. (14 years ago, and you’re still talking about it – Ed)

MusicBrainz has also grown a commercial ecosystem around it, which has helped it be sustainable. There are a number of projects that use the dataset, including Spotify. And its been powering the BBC Music website for about ten years.

Discogs

My fifth example, Discogs is also a music dataset. But its a dataset about vinyl releases. So it focuses more on the releases, labels, etc. Discogs is a little different because it started as, and still is a commercial service. At its core its a marketplace for record collectors. But to power that marketplace you need a dataset of vinyl releases. So they created tools to help the community build it. And, over time, its become progressively more open.

Today all of the data is in the public domain.

OpenPlaques

My sixth example is OpenPlaques. It’s a database of the commemorative plaques that you can see dotted around on buildings and streets. The plaques mark that an important event happened in that building, or that someone famous was born or lived there. Volunteers take photos of the plaques and share them with the service, along with the text and names of anyone who might be mentioned in the plaque.

It provides a really interesting way to explore the historical information in the context of cities and buildings. All of the information is linked to Wikipedia so you can find out more information.

Rebrickable

My seventh example is Rebrickable which you’re unlikely to have heard about. I’m cheating a little here as its a service and not strictly a dataset. But its Lego, so I had to include it.

Rebrickable has a big database of all the official lego sets and what parts they contain. If you’re a fan of lego (they’re called AFOLS – Ed) design and create your own custom lego models (they’re known as MOCS – Ed) then you can upload the design and instructions to the service in machine-readable LEGO CAD formats.

Rebrickable exposes all of the information via an API under a liberal licence. So people can build useful tools. For example using the service you can find out which other official and custom sets you can build with bricks you already own.

Grand Comics Database

My eighth and final example is the Grand Comics Database. It’s also the oldest project as it was started in 1994. The original creators started with desktop tools before bringing it to the web.

It’s a big database of 1.3m comics. It contains everything from The Dandy and The Beano through to Marvel and DC releases. Its not just data on the comics, but also story arcs, artists, authors, etc. If you love comics you’ll love GCD. I checked and this weeks 2000AD (published 2 days ago – Ed) is in there already.

So those are my examples of places where you could contribute to open data.

Open data is an enabler

The interesting thing about them all is that open data is an enabler. Open data isn’t creating economic growth, or being used as a business model. Open licensing is being applied as a tool.

It creates a level playing field that means that everyone who contributes has an equal stake in the results. If you and I both contribute then we can both use the end result for any purpose. A commercial organisation is not extracting that value from us.

Open licensing can help to encourage people to share what they know, which is the reason the web exists.

Working with data

The projects are also great examples of ways of working with data on the web. They’re all highly distributed projects, accepting submissions from people internationally who will have very different skill sets and experience. That creates a challenge that can only be dealt with by having good collaboration tools and by having really strong community engagement.

Understanding the reasons how and why people collaborate to your open database is important. Because often those reasons will change over time. When OpenStreetMap had just started, contributors had the thrill of filling in a blank map with data about their local area. But now contributions are different. It’s more about maintaining data and adding depth.

Collaborative maintenance

In the open data community we often talk about making things open to make them better. It’s the tenth GDS design principle. And making data open does make them better in the sense that more people can use it. And perhaps more eyes can help spot flaws.

But if you really want to let people help make something better, then you need to put your data into a collaborative environment. Then data can get better at the pace of the community and not your ability to accept feedback.

It’s not work if you love it

Hopefully the examples give you an indication of the size of these communities and how much has been created. It struck me that many of them have been around since the early 2000s. I’ve not really found any good recent examples (Maybe people can suggest some – Ed). I wonder what that is?

Most of the examples were born around the Web 2.0 era (Mate. That phrase dates you. – Ed) when we were all excitedly contributing different types of content to different services. Bookmarks and photos and playlists. But now we mostly share things on social media. It feels like we’ve lost something. So it’s worth revisiting these services to see that they still exist and that we can still contribute.

While these fan communities are quietly hard at work, maybe we in the open data community can do more to support them?

There’s a lot of examples of “open” datasets that I didn’t use because they’re not actually open. The licenses are restrictive. Or the community has decided not to think about it. Perhaps we can help them understand why being a bit more open might be better?

There are also examples of openly licensed content that could be turned into more data. Take Wikia for example. It contains 360,000 wikis all with openly licensed content. They get 190m views a month and the system contains 43 million pages. About the same size as the English version of Wikipedia is currently. They’re all full of infoboxes that are crying out to be turned into structured data.

I think it’d be great to have all this fan produced data to a proper part of the open data commons, sitting alongside the government and organisational datasets that are being published.

Thank you (yes, you!)

That’s the end of my talk. I hope I’ve piqued your interest in looking at one or more of these projects in more detail. Hopefully there’s a project that will help you express your inner data geek.

Photo Attributions

Lego SpacemanEdwin AndradeJamie Street, Olu Elet, Aaron Burden, Volkan OlmezAlvaro SerranoRawPixel.com, Jordan WhitfieldAnthony DELANOIX

 

Where can you contribute to open data? Yes, you!

This is just a quick post to gather together some pointers and links that were shared in answer to a question I asked on twitter yesterday:

I want to try out a bunch of different services to explore how easy it is for people to contribute to open data project. Because I’m interested in how we can contribute as individuals, then I’m ruling out things like government open data portals. They’re not typically places where mere punters like you or I can contribute.

I’m also interested in sites that generate open data. Not public data. There needs to be an open licence on the results. Or, at very least a note along the lines of: “do whatever you want with this”.

I’m thinking more of places where we can collaborate around creating open data.

The short list

Here’s a quick list of the suggestions, along with a few I’d already turned up. I’m sure there are a lot more. Please leave a comment or ping me on twitter if you have suggestions. And yes, I’ll turn this into data at some point.

  1. OpenStreetMap was the starter for ten. I’ve already written about a number of ways to can contribute to the effort
  2. Discogs, contribute to their public domain database
  3. Wikipedia, content in infoboxes is presented as data via dbpedia and wikidata
  4. You can also contribute directly to Wikidata
  5. MusicBrainz, is completely crowd-sourced
  6. You can contribute company information to OpenCorporates
  7. Questions you answer on Stackoverflow end up as open data
  8. DemocracyClub are doing an awesome job of co-ordinating crowd-sourced data collection that the UK government should just be doing itself
  9. The product data you add to OpenFoodFacts is open
  10. It looks like you can contribute Creative Commons licensed content and data to the Encylopedia of Life
  11. OpenPlaques is open to contributions
  12. The Quick, Draw with Google data is actually open. Google seem to be opening up more of their research data
  13. ESRI are building some crowdsourcing apps, which generate open data
  14. If you’re in Germany and have some sensor data, you can feed it into OpenSenseMap. Their data dumps are in the public domain

What else should be on this list?

Disqualifications

There were also a number of sites that were suggested, or which I considered, but had to be rejected. Mostly because they’re not actually publishing open data. They either have restrictions on usage, or the licensing is very unclear. If you can clarify any of these then let me know.

Clearly there are hundreds of non-open databases, but do let me know if I’m wrong about any of the above, and I’ll amend the article accordingly.