Open Data for (Big) Kids?

This afternoon Emma Mulqueeny asked on twitter if anyone had any ideas about fun, exciting datasets to inspire kids new to Open Data hacking. I asked whether she was interested in downloadable datasets or just APIs, or both. The answer was both.

So below you’ll find a few suggestions from me about datasets that kids might find fun and interesting. It the kind of stuff my kids are interested in anyway. It’s also the kind of Open Data that excites me, so even if I’m off the mark there may be something in here for you big kids too.

Disclaimer: I make a few references to Kasabi here, which is a service that I’m involved in developing. There’s a getting started guide here.

While I think Kasabi a great service, I reference it here because I’ve already put some of these datasets online there. I’m not trying to promote it here and I’ve referenced sources where available. For educational purposes I think it’d be good for kids to try working with data directly to understand working with files, as well as working with APIs. And indeed services like ScraperWiki, etc.

Presumably the kids will be given some support & guidance on working with the data and API, so I’ll focus this posting on just pointers.


My first and best suggestion is Lego data! Bricklink maintain a community maintained dataset about Lego parts, sets, and their inventories. The data isn’t clearly licensed, but having checked with them, they ask that you just attribute your sources. Attribution being another good thing to teach kids about Open Data.

The Bricklink data files can be downloaded as XML files. You can get individual files with details of all sets, parts, etc. There are also files with inventory information.

I’ve taken a version of that data and turned into into Linked Data (RDF) and published it on Kasabi. So its accessible via a few APIs there.

The code to download all of the data and convert it to RDF is available on github. If you just want to cache the files locally run: rake download. That will save some clicking in the UI.


Another dataset I published on Kasabi is the NASA launch data. This is a conversion of the NSSDC Master Catalog which contains data about all satellite launches dating back to the 1950s.


There’s an online pokedex containing data about Pokemon on the Veekun website. The code for the Pokedex and the source data is up on github.

There are instructions on how to build a local database for the code which relies on Python and PostreSQL. However the data is all in the github project as CSV files so can easily be downloaded and processed using whatever tools you like.

Related to this is Bulbapedia which is a Pokemon wiki. It runs on MediaWiki which offers an API. So there may be scope to get data that way, or mashup data across these two sources. I’ve not heard good things about the MediaWiki API being user-friendly though.

Finally the Ultimate Pokemon Centre also offers a Pokedex. This is oriented towards the video games and includes references to sound files of Pokemon cries

As this post describes, there are ways to export the data from the pokedex as text files. For example you can download the names of all Pokemon in Pokemon Red. To build an export start at this page and select your options.


Video game data tends to be locked down to review sites, but there are a few places you can get some good data for non-commercial uses.

I don’t have any personal experience of any of those APIs so can’t offer any guidance. Would be good to teach the kids about the limitations of particular data and API licensing terms here: e.g. how might they be limited by upstream providers?

Board games more your thing? What about the XML API and database export offered by BoardGameGeek?


TV and Film sources tend to be similarly hampered by licensing terms, but TVdb offers an API to grab XML extracts of its database, e.g. details on TV series, episodes, etc.

BBC Wildlife

The BBC Wildlife site is a Linked Data site so you can grab RDF data from any of its URLs. The data includes description of the animals, references to images and links to BBC content and clips. For example here’s a page about Tigers and here’s that same page as RDF.

I crawled the site last year to gather up the data. That’s available in Kasabi but is clearly out of date now. (No dinosaurs!)

Hopefully that’s enough to get some enthusiastic kids started on open data hacking.

Dr Who

The Guardian Datablog is a trove of interesting data snippets. Two of which are about Dr Who.

The first is a spreadsheet that includes every time-travel event made by the Doctors. The data includes the episode, doctor, when/where the travel occured, etc. Available as a Google Spreadsheet which can also be downloaded.

There’s also another spreadsheet which contains the name of every Dr Who villain since the 1960s. Again it includes name of Doctor and episode, so the two datasets are ripe for mashing up.

There are numerous Dr Who fan sites and wikis so there is likely to be some scope for linking out to various websites for images, reviews, etc.

One thought on “Open Data for (Big) Kids?

Comments are closed.