This afternoon Emma Mulqueeny asked on twitter if anyone had any ideas about fun, exciting datasets to inspire kids new to Open Data hacking. I asked whether she was interested in downloadable datasets or just APIs, or both. The answer was both.
So below you’ll find a few suggestions from me about datasets that kids might find fun and interesting. It the kind of stuff my kids are interested in anyway. It’s also the kind of Open Data that excites me, so even if I’m off the mark there may be something in here for you big kids too.
While I think Kasabi a great service, I reference it here because I’ve already put some of these datasets online there. I’m not trying to promote it here and I’ve referenced sources where available. For educational purposes I think it’d be good for kids to try working with data directly to understand working with files, as well as working with APIs. And indeed services like ScraperWiki, etc.
Presumably the kids will be given some support & guidance on working with the data and API, so I’ll focus this posting on just pointers.
My first and best suggestion is Lego data! Bricklink maintain a community maintained dataset about Lego parts, sets, and their inventories. The data isn’t clearly licensed, but having checked with them, they ask that you just attribute your sources. Attribution being another good thing to teach kids about Open Data.
The Bricklink data files can be downloaded as XML files. You can get individual files with details of all sets, parts, etc. There are also files with inventory information.
I’ve taken a version of that data and turned into into Linked Data (RDF) and published it on Kasabi. So its accessible via a few APIs there.
The code to download all of the data and convert it to RDF is available on github. If you just want to cache the files locally run:
rake download. That will save some clicking in the UI.
There are instructions on how to build a local database for the code which relies on Python and PostreSQL. However the data is all in the github project as CSV files so can easily be downloaded and processed using whatever tools you like.
Related to this is Bulbapedia which is a Pokemon wiki. It runs on MediaWiki which offers an API. So there may be scope to get data that way, or mashup data across these two sources. I’ve not heard good things about the MediaWiki API being user-friendly though.
Finally the Ultimate Pokemon Centre also offers a Pokedex. This is oriented towards the video games and includes references to sound files of Pokemon cries
As this post describes, there are ways to export the data from the pokedex as text files. For example you can download the names of all Pokemon in Pokemon Red. To build an export start at this page and select your options.
Video game data tends to be locked down to review sites, but there are a few places you can get some good data for non-commercial uses.
- The Games DB — a database of nearly 10,000 video games which offers an API
- GiantBomb — offers an API for Games and Comics
I don’t have any personal experience of any of those APIs so can’t offer any guidance. Would be good to teach the kids about the limitations of particular data and API licensing terms here: e.g. how might they be limited by upstream providers?
The BBC Wildlife site is a Linked Data site so you can grab RDF data from any of its URLs. The data includes description of the animals, references to images and links to BBC content and clips. For example here’s a page about Tigers and here’s that same page as RDF.
I crawled the site last year to gather up the data. That’s available in Kasabi but is clearly out of date now. (No dinosaurs!)
Hopefully that’s enough to get some enthusiastic kids started on open data hacking.
The Guardian Datablog is a trove of interesting data snippets. Two of which are about Dr Who.
The first is a spreadsheet that includes every time-travel event made by the Doctors. The data includes the episode, doctor, when/where the travel occured, etc. Available as a Google Spreadsheet which can also be downloaded.
There’s also another spreadsheet which contains the name of every Dr Who villain since the 1960s. Again it includes name of Doctor and episode, so the two datasets are ripe for mashing up.
There are numerous Dr Who fan sites and wikis so there is likely to be some scope for linking out to various websites for images, reviews, etc.