Reflecting on 2020

It’s been a year, eh?

I didn’t have a lot of plans for 2020. And those that I did have were pretty simple. That was deliberate as I tend to beat myself up for not achieving everything. But it turned out to be for the best anyway.

I’m not really expecting 2021 to be any different to be honest. But I’ll write another post about plans for the next 12 months.

For now, I just want to write a diary entry with a few reflections and notes on the year. Largely because I want to capture some of the positives and lessons learned.

Working

Coming in to 2020 I’d decided it was probably time for a change. I’ve loved working at the Open Data Institute, but the effort and expense of long-term distance commuting was starting to get a bit much.

Three trips a week, with increasingly longer days, on top of mounting work pressures was affecting my mood and my health. I was on a health kick in Q1 which helped, but The Lockdown really demonstrated for me how much that commute was wiping me out. And reminded me of what work-life balanced looked like.

I was also ready to do something different. I feel like my career has had regular cycles where I’ve been doing research and consultancy, interleaved with periods of building and delivering things. I decided it was time to get back to the latter.

So, after helping the team weather the shit-storm of COVID-19, I resigned. Next year is going to look a lot different one way or another!

I’ve already written some thoughts on things I’ve enjoyed working on.

Running

I set out to lose some weight and be healthier at the start of the year. And I succeeded in doing that. I’ve been feeling so much better because of that.

I also took up running. I did a kind of self-directed Couch to 5K. I read up on the system and gradually increased distance and periods of running over time as recommended. I’ve previously tried using an app but without much success. I also prefer running without headphones on.

The hardest part has been learning to breathe properly. I suffer from allergic asthma. It used to be really bad when I was a kid. Like, not being able to properly walk up stairs bad. And not being allowed out during school play times bad.

It’s gotten way better and rarely kicks in now unless the pollen is particularly bad. But I still get this rising panic when I’m badly out of breath. I’ve mostly dealt with it now and found that running early in the mornings avoids issues with pollen.

While its never easy, it turns out running can actually be enjoyable. As someone who closely identifies with the sloth, this is a revelation.

It’s also helped to work out nervous tension and stress during the year. So its great to have found a new way to handle that.

Listening

My other loose goal for 2020 was to listen to more music. I’d fallen into the habit of only listening to music while commuting, working or cooking. While that was plenty of opportunity, I felt like I was in a rut. Listening to the same mixes and playlists as they helped me tune out others and concentrate on writing.

I did several things to achieve that goal. I started regularly listening to my Spotify Discover and Release Radar playlists. And dug into the back catalogues from the artists I found there.

I listened to more radio to break out of my recommendation bubble and used the #music channel on the ODI slack to do the same. I also started following some labels on YouTube and via weekly playlists on Spotify.

While I’ve griped about the BBC Sounds app, and while its still flaky, I have to admit its really changed how I engage with the BBC’s radio output. The links from track lists to Spotify is one of the killer features for me.

Building in listening to the BBC Unclassified show with Elizabeth Alker, on my Saturday mornings, has been one of the best decisions I’ve made this year.

Another great decision was to keep a dedicated playlist of “tracks that I loved on first listen, which were released in 2020“. Its helped me be intentional about recording music that I like, so I can dig deeper. Here’s a link to the playlist which has 247 tracks on it.

According to my year in review, Spotify tells me I listened to 630 new artists this year, across 219 new genres. We all know Spotify genres are kind of bullshit, but I’m pleased with that artist count.

Cooking

I generally cook on a Saturday night. I try out a new recipe. We drink cocktails and listen to Craig Charles Funk and Soul Show.

I’ve been tweeting what I’ve been cooking this year to keep a record of what I made. And I bookmark recipes here.

I was most proud of the burger buns, bao buns and gyoza.

We also started a routine of Wednesday Stir Fries, where I cooked whilst Debs was taking Martha to her ice-skating lesson. Like all routines this year that fell away in April.

But, I’ve added Doubanjiang (fermented broad bean chilli paste) to my list of favourite ingredients. Really quick and tasty base for a quick stir fry with a bit of garlic, ginger and whatever veg is to hand.

Gardening

I’ve already published a blog post with my Gardening Retro for 2020.

Reading

Like last year I wanted to read more again this year. As always I’ve been tweeting what I’ve read. I do this for the same reason I tweet things that I cook: it helps me track what I’ve been doing. But it also sometimes prompts interesting chats and other recommendations. Which is why I use social media after all.

I’ve fallen into a good pattern of having one fiction book, one non-fiction book and one graphic novel on the go at any one time. This gives me a choice of things to dip into based on how much time, energy and focus I have. That’s been useful this year.

I’ve read fewer papers and articles (I track those here). This is in large part because my habit was to do this during my commute. But again, that routine has fallen away.

If I’m honest its also because I’ve not really felt like it this year. I’ve read what I needed to, but have otherwise retreated into comfort reading.

The other thing I’ve been doing this year is actively muting words, phrases and hashtags on twitter. It helps me manage what I’m seeing and reading, even if I can’t kick the scrolling habit. I feel vaguely guilty about that. But how else to manage the fire hose of other people’s thoughts, attentions and fears?

Here are some picks. These weren’t all published this year. Its just when I consumed them:

Comics

I also read the entire run of Locke and Key, finished up the Alan Moore Swamp Thing collected editions and started in on Monstress. All great.

I also read a lot of Black Panther single this year. Around 100-150 I think. Which lead to my second most popular tweet this year (40,481 impressions).

Non-fiction

Fiction

I enjoyed but was disappointed by William Gibson’s Agency. Felt like half a novel.

Writing

I started a monthly blogging thread this year. I did that for two reasons. The first was to track what I was writing. I wanted to write more this year and to write differently.

The second was as another low key way to promote posts so that they might find readers. I mostly write for myself, but its good to know that things get read. Again, prompting discussions is why I do this in the open rather than in a diary.

In the end I’ve written more this year than last. Which is good. Not writing at all some months was also fine.

I managed to write a bit of fiction and a few silly posts among the thousand word opinion pieces on obscure data topics. My plan to write more summaries of research papers failed, because I wasn’t reading that many.

My post looking at the statistic about data scientists spending 80% of their time cleaning data, was the most read of what I wrote this year (4379 views). But my most read post of all time remains this one on derived data (25,499 views). I should do a better version.

The posts I’m most pleased with are the one about dataset recipes and the two pieces of speculative fiction.

I carry around stuff in my head, sometimes for weeks or months. Writing it down helps me not just organise those thoughts but also move on to other things. This too is a useful copying mechanism.

Coding

Didn’t really do any this year. All things considered, I’m fine with that. But this will change next year.

Gaming

This year has been about those games that I can quickly pick up and put down again.

I played, loved, but didn’t finish Death Stranding. I need to immerse myself in it and haven’t been in the mood. I dipped back into The Long Dark, which is fantastically well designed, but the survival elements were making me anxious. So I watch other people play it instead.

Things that have worked better: Darkest Dungeon. XCOM: Chimera Squad. Wilmot’s Warehouse. Townscaper. Ancient Enemy. I’ve also been replaying XCOM 2.

These all have relatively short game loops and mission structures that have made them easy to dip into when I’ve been in the mood. Chimera Squad is my game of the year, but Darkest Dungeon is now one of my favourite games ever.

There Is No Game made me laugh. And Townscaper prompted some creativity which I wrote about previously.

That whole exercise lead to my most popular tweet this year (54,025 impressions). People like being creative. Nice to have been responsible for a tiny spark of fun this year.

This is the first year in ages when I’ve not ended up with a new big title that I’m excited to dip into. Tried and failed to get a PS5. Nothing else is really grabbing my interest. I only want a PS5 so I can play the Demon’s Souls remake.

Watching

For the most part I’ve watched all of the things everyone else seems to have watched.

Absolutely loved the Queen’s Gambit. Enjoyed Soul, the Umbrella Academy and The Boys. Thought Kingdom was brilliant (I was late to that one) and #Alive was fun. Korea clearly know how to make Zombie movies and so I’m looking forward to Peninsula.

The Mandalorian was so great its really astounding that no-one thought to make any kind of film or TV follow-up to the original Star Wars trilogies until now. Glad they finally did and managed to mostly avoid making it about the same characters.

But Star Trek: Discovery unfortunately seems to have lost its way. I love the diverse characters and the new setting has so much potential. The plot is just chaotic though. His Dark Materials seems to be a weekly episode of exposition. Yawn.

If I’m being honest though, then my topic picks for 2020 are the things I’ve been able to relax into for hours at a time:

  • The Finnish guy streaming strategy and survival games like The Long Dark and XCom 2
  • The Dutch guy playing classic and community designed Doom 2 levels
  • And the guy doing traditional Japanese woodblock carvings

I’m only slightly exaggerating to say these were the only things I watched in that difficult March-May period.

Everything else

I could write loads more about 2020 and what it was like. But I won’t. I’ve felt all of the things. Had all of the fears, experienced all of the anger, disbelief and loss.

The lesson is to keep moving forward. And to turn to books, music, games, walking, running, cooking to help keep us sane.

A short list of some of the things I’ve worked on which I’ve particularly enjoyed

Part of planning for whatever comes next for me in my career involved reflecting on the things I’ve enjoyed doing. I’m pleased to say that there’s a quite a lot.

I thought I’d write some of them down to help me gather my thoughts around what I’d like to do more of in the future. And, well, it never hurts to share your experience when you’re looking for work. Right?

The list below focuses on projects and activities which I’ve contributed to or had a hand in leading.

There’s obviously more to a career and work than that. For example, I’ve enjoyed building a team and supporting them in their work and development. I’ve enjoyed pitching for and winning work and funding.

I’ve also very much enjoyed working with a talented group of people who have brought a whole range of different skills and experiences to projects we’ve collaborated on together. But this post isn’t about those things.

Some of the things I’ve enjoyed working on at the ODI

  • Writing this paper on the value of open identifiers, which was co-authored with a team at Thomson Reuters. It was a great opportunity to distil a number of insights around the benefits of open, linked data. I think the recommendations stand up well. Its a topic I keep coming back to.
  • Developing the open data maturity model and supporting tool. The model was used by Defra to assess all its arms-length bodies during their big push to release open data. It was adopted by a number of government agencies in Australia, and helped to structure a number of projects that the ODI delivered to some big private sector organisations. Today we’d scope the model around data in general, not just open data. And it needs a stronger emphasis on diversity, inclusion, equity and ethics. But I think the framework is still sound
  • Working with the Met Office on a paper looking at the state of weather data infrastructure. This turned into a whole series of papers looking at different sectors. I particularly enjoyed this first one as it was a nice opportunity to look at data infrastructure through a number of different lenses in an area that was relatively new to me. The insight that an economic downturn in Russian lead to issues with US agriculture because of data gaps in weather forecasting might be my favourite example of how everything is intertwingled. I later used what I learned in that paper to write this primer on data infrastructure.
  • Leading research and development of the open standards for data guidebook. Standards being another of my favourite topics, it was great to have space to explore this area in more detail. And I got to work with Edafe which was ace.
  • Leading development of the OpenActive standards. Standards development is tiring work. But I’m pleased with the overall direction that we took and what we’ve achieved. I learned a lot. And I had the chance to iterate on what we were doing based on what we learned from developing the standards guidebook, before handing it over to others to lead. I’m pleased that we were able to align the standards with Schema.org and SKOS. I’m less pleased that it resulted in lots of video of me on YouTube leading discussions in the open.
  • Developing a methodology for doing data ecosystem mapping. The ODI now has a whole tool and methodology for mapping data ecosystems. It’s used in a lot of projects. While I wouldn’t claim to have invented the idea of doing this type of exercise, the ODI’s approach directly builds on the session I ran at Open Data Camp #4. I plan to continue to work on this as there’s much more to explore.
  • Leading development of the collaborative maintenance guidebook. Patterns provide a great way to synthesise and share insight. So it was fantastic to be able to apply that approach to capturing some of the lessons learned in projects like OpenStreetMap, Wikidata and other projects. There’s a lot that can be applied in this guidebook to help shape many different data projects and platforms. The future of data management is more, not less collaborative.
  • Researching the sustainable institutions report. One of the reasons I (re-)joined the ODI about 4 years ago was to work on data institutions. Although we weren’t using that label at that point. I wanted to help to set up organisations like CrossRef, OpenStreetMap and others that are managing data for a community. So it was great to be involved in this background research. I still want to do that type of work, but want to be working in that type of organisation, rather than advising them.

There’s a whole bunch of other things I did during my time at the ODI.

For example, I’ve designed and delivered a training course on API design, evaluated a number of open data platforms, written code for a bunch of openly available tools, provided advice to bunch of different organisations around the world, and written guidance that still regularly gets used and referenced by people. I get a warm glow from having done all those things.

Things I’ve enjoyed working on elsewhere

I’ve also done a bunch of stuff outside the ODI that I’ve also thoroughly enjoyed. For example:

  • I’ve helped to launch two new data-enabled products. Some years ago, I worked with the founders of Growkudos to design and build the first version of their platform, then helped them hire a technical team to take it forward. I also helped to launch EnergySparks, which is now used by schools around the country. I’m now a trustee of the charity.
  • I’ve worked with the ONS Digital team. After working on this prototype for Matt Jukes and co at the ODI, it was great to spend a few months freelancing with Andy Dudfield and the team working on their data principles and standards to put stats on the web. Publishing statistics is good, solid data infrastructure work.
  • Through Bath: Hacked, I’ve lead a community mapping activity to map wheelchair accessibility in the centre of Bath. It was superb to have people from the local community, from all walks of life, contributing to the project. Not ashamed to admit that I had a little cry when I learned that one of the mappers hadn’t been into the centre of Bath for years, because they’d felt excluded by their disability. But was motivated to be part of the project. That single outcome made it all worthwhile for me.

What do I want to do more of the in future? I’ve spent quite a bit of the last few years doing research and advising people about how they might go about their projects. But its time to get back into doing more hands-on practical work to deliver some data projects or initiatives. More doing, less advising.

So, I’m currently looking for work. If you’re looking for a “Leigh shaped” person in your organisation. Where “Leigh shaped” means “able to do the above kinds of things” then do get in touch.

The Saybox

I’ve been in a reflective mood over the past few weeks as I wrap up my time at the Open Data Institute. One of the little rituals I will miss is the “Saybox”. I thought I’d briefly write it up and explain why I like it.

I can’t remember who originally introduced the idea. It’s been around long enough that I think I was still only working part-time as an associate, so wasn’t always at every team event. But I have a suspicion it was Briony. Maybe someone can correct me on that? (Update: it was Briony 🙂 )

It’s also possible that the idea is well-known and documented elsewhere, but I couldn’t find a good reference. So again, if someone has a pointer, then let me know and I’ll update this post.

Originally, the Saybox was just a decorated shoebox. It had strong school craft project vibes. I’m sad that I can’t find a picture of it

The idea is that anyone in the team can drop an anonymous post-it into the box with a bit of appreciation for another member of the team, questions for the leadership team, a joke or a “did you know”. At our regular team meetings we open the box, pass it around and we all read out a few of the post-its.

I’ll admit that it took me a while to warm to the idea. But it didn’t take me long to be won over.

The Saybox has became part of the team culture. A regular source of recognition for individual team members, warm welcomes for new hires and, at times, a safe way to surface difficult questions. The team have used it to troll each other whilst on holiday and it became a source of running gags. For a time, no Saybox session was complete without a reminder that Simon Bullmore ran a marathon.

As I took on leadership positions in the team, I came to appreciate it for other reasons. It was more than just a means of providing and encouraging feedback across the team. It became a source of prompts for where more clarity on plans or strategy were needed. And, in a very busy setting, it also helped to reinforce how delivery really is a team sport.

There’s nothing like hearing an outpouring of appreciation for a individual or small team to be constantly reminded of the important role they play.

Like any aspect of team culture, the Saybox has evolved over time.

There’s a bit less trolling and fewer running gags now. But the appreciation is still strong.

The shoebox was also eventually replaced by a tidy wooden box. This was never quite the same for me. The shoebox had more of a scruffy, team-owned vibe about it.

As we’ve moved to remote working we’ve adapted the idea. We now use post-it notes on a Jamboard, and take turns reading them over the team zooms. Dr Dave likes to tick them off as we go, helping to orchestrate the reading.

The move to online unfortunately means there isn’t the same constant reminder to provide feedback, in the way that a physical box presents. You don’t just walk past a Jamboard on your way to or from a meeting. This means that the Saybox jamboard is now typically “filled” just before or during the team meetings, which can change the nature of the feedback it contains.

It’s obviously difficult to adapt team practices to virtual settings. But I’m glad the ODI has kept it going.

I’ll end this post with a brief confession. It might help reinforce why rituals like this are so important.

In a Saybox session, when we used to do them in person with actual paper, we handed the note over to whoever it was about. So sometimes you could leave a team meeting with one or more notes of appreciation from the team. That’s a lovely feeling.

I got into the habit of dropping them into my bag or sticking them into my notebook. As I tidied up my bag or had a clearout of old paperwork, I started collecting the notes into an envelope.

The other day I found that envelope in a drawer. As someone who is wired to always look for the negatives in any feedback, having these hand-written notes is lovely.

There’s nothing like reading unprompted bits of positive feedback, collected over about 5 years or so, to help you reflect on your strengths.

Thanks everyone.

A poem about standards

To help me wrap up my time at the ODI I asked the team for suggestions for things I could add to my list of handover documentation.

Amongst the suggestions that came back was: “Maybe also a poem about why standards are the best thing on Earth?”

So, with a nod to the meme and apologies to William Carlos Williams. I wrote this:

I have tidied
the data
in your
spreadsheet

those numbers
you were
planning
to share

Forgive me
they were so messy
but now standard
and FAIR

Close enough I think 🙂

Consulting Spreadsheet Detective, Season 1

I was very pleased to announce my new TV series today, loosely based on real events. More details here in the official press release.

FOR IMMEDIATE RELEASE

Coming to all major streaming services in 2021 will be the exciting new series: “Turning the Tables“.

Exploring the murky corporate world of poorly formatted spreadsheets and nefarious macros each episode of this new series will explore another unique mystery.

When the cells lie empty, who can help the CSV:PI team pivot their investigation?

When things don’t add up, who can you turn to but an experienced solver?

Who else but Leigh Dodds, Consulting Spreadsheet Detective?

This smart, exciting and funny new show throws deductive reasoner Dodds into the mix with Detectives Rose Cortana and Colm Bing part of the crack new CSV:PI squad.

Rose: the gifted hacker. Quick to fire up an IDE, but slow to validate new friends.

Colm: the user researcher. Strong on empathy but with an enigmatic past that hints at time in the cells.

What can we expect from Season 1?

Episode 1: #VALUE!

In his first case, Dodds has to demonstrate his worth to a skeptical Rose and Colm, by fixing a corrupt formula in a startup valuation.

Episode 2: #NAME?

A personal data breach leaves the team in a race against time to protect the innocent. A mysterious informant known as VLOOKUP leaves Dodds a note.

Episode 3: #REF!

A light-hearted episode where Dodds is called in to resolve a mishap with a 5-a-side football team matchmaking spreadsheet. Does he stay between the lines?

Episode 4: #NUM?

A misparsed gene name leads a researcher into recommending the wrong vaccine. It’s up to Dodds to fix the formula.

Episode 5: #NULL!

Sometimes it’s not the spreadsheet that’s broken. Rose and Colm have to educate a researcher on the issue of data bias, while Dodds follow up references to the mysterious Macro corporation.

Episode 6: #DIV/0?

Chasing down an internationalisation issue Dodds, Rose and Colm race around the globe following a trail of error messages. As Dodds gets unexpectedly separated from the CSV:PI team, Rose and Colm unmask the hidden cell containing the mysterious VLOOKUP.

In addition to the six episodes in season one, a special feature length episode will air on National Spreadsheet Day 2021:

Feature Episode: #####

Colm’s past resurfaces. Can he grow enough to let the team see the problem, and help him validate his role in the team?

Having previously only anchored documentaries, like “Around with World with 80,000 Apps” and “Great Data Journeys“, taking on the eponymous role will be Dodds’ first foray into fiction. We’re sure he’ll have enough pizazz to wow even the harshest critics.

“Turning the Tables” will feature music composed by Dan Barrett.

Tip for improving standards documentation

I love a good standard. I’ve written about them a lot here.

As its #WorldStandardsDay I thought I’d write a quick post to share something that I’ve learned from leading and supporting some standards work.

I’ve already shared this with a number of people who have asked for advice on standards work, and in some recent user research interviews I’ve participated in. So it makes sense to write it down.

In the ODIHQ standards guide, we explained that at the end of your initial activity to develop a standard, you should plan to produce a range of outputs. This include a variety of tools and guidance that help people use the standard. You will need much more than just a technical specification.

To plan for the different types of documentation that you may need I recommend applying this “Grand Unified Theory of Documentation“.

That framework highlights four different types of documentation are intended to be used by different audiences to address different needs. The content designers and writers out there reading this will be be rolling their eyes at this obvious insight.

Here’s how I’ve been trying to apply it to standards documentation:

Reference

This is your primary technical specification. It’ll have all the detail about the standard, the background concepts, the conformance criteria, etc.

It’s the document of record that captures all of the hard work you’ve invested in building consensus around the standard. It fills a valuable role as the document you can point back to when you need to clarify or confirm what was agreed.

But, unless its a very simple standard, it’s going to have a limited audience. A developer looking to implement a conformant tool, API or library may need to read and digest all of the detail. But most people want something else.

Put the effort into ensuring its clear, precise and well-structured. But plan to also produce three additional categories of documentation.

Explainers

Many people just want an overview of what it is designed to do. What value will it provide? What use cases was it designed to support? Why was it developed? Who is developing it?

These are higher-level introductory questions. The type of questions that business stakeholders want to answer to sign-off on implementing a standard, so it goes onto a product roadmap.

Explainers are also useful background information that are useful for a developer ahead of taking a deeper dive. If there are some key concepts that are important to understanding the design and implementation of a standard, then write an explainer.

Tutorials

A simple, end-to-end description of how to apply the standard. E.g. how to publish a dataset that conforms to the standard, or export data from an existing system.

A tutorial will walk you through using a specific set of tools, frameworks or programming languages. The end result being a basic implementation of the standard. Or a simple dataset that passes some basic validation checks. A tutorial won’t cover all of the detail, it’s enough to get you started.

You may need several tutorials to support different types of users. Or different languages and frameworks.

If you’ve produced a tool, like validator or a template spreadsheet to support data publication, you’ll probably need a tutorial for each of them unless they are very simple to use.

Tutorials are gold for a developer who has been told: “please implement this standard, but you only have 2 days to do it”.

How-Tos

Short, task oriented documentation focused on helping someone apply the standard. E.g. “How to produce a CSV file from Excel”, “Importing GeoJSON data in QGIS”, “Describing a bus stop”. Make them short and digestible.

How-Tos can help developers build from a tutorial, to a more complete implementation. Or help a non-technical user quickly apply a standard or benefit from it.

You’ll probably end up with lots of these over time. Drive creating them from the types of questions or support requests you’re getting. Been asked how to do something three times? Write a How-To.

There’s lots more that can be said about standards documentation. For example you could add Case Studies to this list. And its important to think about whether written documentation is the right format. Maybe your Explainers and How-Tos can be videos?

But I’ve found the framework to be a useful planning tools. Have a look at the documentation for more tips.

Producing extra documentation to support the launch of a standard, and then investing in improving and expanding it over time will always be time well-spent.

Lunchtime Lecture: “How you (yes, you) can contribute to open data”

The following is a written version of the lunchtime lecture I gave today at the Open Data Institute. I’ll put in a link to the video when it comes online. It’s not a transcript, I’m just writing down what I had planned to say.

Hello!

I’m going to talk today about some of the projects that first got me excited about data on the web and open data specifically. I’m hopefully going to get you excited about them too. And show some ways in which you can individually get involved in creating some open data.

Open data is not (just) open government data

I’ve been reflecting recently about the shape of the open data community and ecosystem, to try and understand common issues and areas for useful work.

For example, we spend a lot of time focusing on Open Government Data. And so we talk about how open data can drive economic growth, create transparency, and be used to help tackle social issues.

But open data isn’t just government data. It’s a broader church that includes many different communities and organisations who are publishing and using open data for different purposes.

Open data is not (just) organisational data

More recently, as a community, we’ve focused some of our activism on encouraging commercial organisations to not just use open data (which many have been doing for years), but also to publish open data.

And so we talk about how open data can be supported by different business models and the need for organisational change to create more open cultures. And we collect evidence of impact to encourage more organisations to also become more open.

But open data isn’t just about data from organisations. Open data can be created and published by individuals and communities for their own needs and purposes.

Open data can (also) be a creative activity

Open data can also be a creative activity. A means for communities to collaborate around sharing what they know about a topic that is important or meaningful to them. Simply because they want to do it. I think sometimes we overlook these projects in the drive to encourage governments and other organisations to publish open data.

So I’m going to talk through eight (you said six in the talk, idiot! – Ed) different example projects. Some you will have definitely heard about before, but I suspect there will be a few that you haven’t. In most cases the primary goals of these projects are to create an openly licensed dataset. So when you contribute to the project, you’re directly helping to create more open data.

Of course, there are other ways in which we each contribute to open data. But these are often indirect contributions. For example where our personal data that is held in various services is aggregated, anonymised and openly published. But today I want to focus today on more direct contributions.

For each of the examples I’ve collected a few figures that indicate the date the project started, the number of contributors, and an indication of the size of the dataset. Hopefully this will help paint a picture of the level of effort that is already going into maintaining these resources. (Psst, see the slides for the figures – Ed)

Wikipedia

The first example is Wikipedia. Everyone knows that anyone can edit Wikipedia. But you might not be aware that Wikipedia can be turned into structured data and used in applications. There’s lots of projects that do it. E.g. dbpedia which brings Wikipedia into the web of data.

The bit that’s turned into structured data are the “infoboxes” that give you the facts and figures about the person (for example) that you’re reading about. So if you add to Wikipedia and specifically add to the infoboxes, then you’re adding to an openly licensed dataset.

The most obvious example of where this data is used is in Google search results. The infoboxes you seen on search results whenever you google for a person, place or thing is partly powered by Wikipedia data.

A few years ago I added a wikipedia page for Gordon Boshell, the author of some children’s books I loved as a kid. There wasn’t a great deal of information about him on the web, so I pulled whatever I could find together and created a page for him. Now when anyone searches for Gordon Boshell they can see some information about him right on Google. And they now link out to the books that he wrote. It’s nice to think that I’ve helped raise his profile.

There’s also a related project from the Wikimedia Foundation called Wikidata. Again, anyone can edit it, but its a database of facts and figures rather than an encyclopedia.

OpenStreetMap

The second example is OpenStreetMap. You’ll definitely have already heard about its goal to create a crowd-sourced map of the world. OpenStreetMap is fascinating because its grown this incredible ecosystem of tools and projects that make it easier to contribute to the database.

I’ve recently been getting involved with contributing to OpenStreetMap. My initial impression was that I was probably going to have to get a commercial GPS and go out and do complicated surveying. But its not like that at all. It’s really easy to add points to the map, and to use their tools to trace buildings from satellite imagery. They provide create tutorials to help you get started.

It’s surprisingly therapeutic. I’ve spent a few evenings drinking a couple of beers and tracing buildings. It’s a bit like an adult colouring book, except you’re creating a better map of the world. Neat!

There are a variety of other tools that let you contribute to OpenStreetMap. For example Wheelmap allows you to add wheelchair accessibility ratings to locations on the map. We’ve been using this in the AccessibleBath project to help crowd-source data about wheelchair accessibility in Bath. One afternoon we got a group of around 25 volunteers together for a couple of hours and mapped 30% of the city centre.

There’s a lot of humanitarian mapping that happens using OpenStreetMap. If there’s been a disaster or a disease outbreak then aid workers often need better maps to help reach the local population and target their efforts. Missing Maps lets you take part in that. They have a really nice workflow that lets you contribute towards improving the map by tracing satellite imagery.

There’s a related project called MapSwipe. Its a mobile application that presents you with a grid of satellite images. All you have to do is click the titles which contain a building and then swipe left. Behind the scenes this data is used to direct Missing Maps volunteers towards the areas where more detailed mapping would be most useful. This focuses contributors attention where its best needed and so is really respectful of people’s time.

MapSwipe can also be used offline. So you can download a work package to do when you’re on your daily commute. Easy!

Zooniverse

You’ve probably also heard of Zooniverse, which is my third example. It’s a platform for citizen science projects. That just means using crowd-sourcing to create scientific datasets.

Their most famous project is probably GalaxyZoo which asked people to help classify objects in astronomical imagery. But there are many other projects. If you’re interested in biology then perhaps you’d like to help catalogue specimens held in the archives of the Natural History Museum?

Or there’s Old Weather, which I might get involved with. In that project you can help to build a picture of our historical climate by transcribing the weather reports that whaling ship captains wrote in their logs. By collecting that information we can build a dataset that tells us more about our climate.

I think its a really innovative way to use historical documents.

MusicBrainz

This is my fourth and favourite example. MusicBrainz is a database of music metadata: information about artists, albums, and tracks. It was created in direct response to commercial music databases that were asking people to contribute to their dataset, but then were taking all of the profits and not returning any value to the community. MusicBrainz created a free, open alternative.

I think MusicBrainz is the first open dataset I first got involved with. I wrote a client library to help developers use the data. (14 years ago, and you’re still talking about it – Ed)

MusicBrainz has also grown a commercial ecosystem around it, which has helped it be sustainable. There are a number of projects that use the dataset, including Spotify. And its been powering the BBC Music website for about ten years.

Discogs

My fifth example, Discogs is also a music dataset. But its a dataset about vinyl releases. So it focuses more on the releases, labels, etc. Discogs is a little different because it started as, and still is a commercial service. At its core its a marketplace for record collectors. But to power that marketplace you need a dataset of vinyl releases. So they created tools to help the community build it. And, over time, its become progressively more open.

Today all of the data is in the public domain.

OpenPlaques

My sixth example is OpenPlaques. It’s a database of the commemorative plaques that you can see dotted around on buildings and streets. The plaques mark that an important event happened in that building, or that someone famous was born or lived there. Volunteers take photos of the plaques and share them with the service, along with the text and names of anyone who might be mentioned in the plaque.

It provides a really interesting way to explore the historical information in the context of cities and buildings. All of the information is linked to Wikipedia so you can find out more information.

Rebrickable

My seventh example is Rebrickable which you’re unlikely to have heard about. I’m cheating a little here as its a service and not strictly a dataset. But its Lego, so I had to include it.

Rebrickable has a big database of all the official lego sets and what parts they contain. If you’re a fan of lego (they’re called AFOLS – Ed) design and create your own custom lego models (they’re known as MOCS – Ed) then you can upload the design and instructions to the service in machine-readable LEGO CAD formats.

Rebrickable exposes all of the information via an API under a liberal licence. So people can build useful tools. For example using the service you can find out which other official and custom sets you can build with bricks you already own.

Grand Comics Database

My eighth and final example is the Grand Comics Database. It’s also the oldest project as it was started in 1994. The original creators started with desktop tools before bringing it to the web.

It’s a big database of 1.3m comics. It contains everything from The Dandy and The Beano through to Marvel and DC releases. Its not just data on the comics, but also story arcs, artists, authors, etc. If you love comics you’ll love GCD. I checked and this weeks 2000AD (published 2 days ago – Ed) is in there already.

So those are my examples of places where you could contribute to open data.

Open data is an enabler

The interesting thing about them all is that open data is an enabler. Open data isn’t creating economic growth, or being used as a business model. Open licensing is being applied as a tool.

It creates a level playing field that means that everyone who contributes has an equal stake in the results. If you and I both contribute then we can both use the end result for any purpose. A commercial organisation is not extracting that value from us.

Open licensing can help to encourage people to share what they know, which is the reason the web exists.

Working with data

The projects are also great examples of ways of working with data on the web. They’re all highly distributed projects, accepting submissions from people internationally who will have very different skill sets and experience. That creates a challenge that can only be dealt with by having good collaboration tools and by having really strong community engagement.

Understanding the reasons how and why people collaborate to your open database is important. Because often those reasons will change over time. When OpenStreetMap had just started, contributors had the thrill of filling in a blank map with data about their local area. But now contributions are different. It’s more about maintaining data and adding depth.

Collaborative maintenance

In the open data community we often talk about making things open to make them better. It’s the tenth GDS design principle. And making data open does make them better in the sense that more people can use it. And perhaps more eyes can help spot flaws.

But if you really want to let people help make something better, then you need to put your data into a collaborative environment. Then data can get better at the pace of the community and not your ability to accept feedback.

It’s not work if you love it

Hopefully the examples give you an indication of the size of these communities and how much has been created. It struck me that many of them have been around since the early 2000s. I’ve not really found any good recent examples (Maybe people can suggest some – Ed). I wonder what that is?

Most of the examples were born around the Web 2.0 era (Mate. That phrase dates you. – Ed) when we were all excitedly contributing different types of content to different services. Bookmarks and photos and playlists. But now we mostly share things on social media. It feels like we’ve lost something. So it’s worth revisiting these services to see that they still exist and that we can still contribute.

While these fan communities are quietly hard at work, maybe we in the open data community can do more to support them?

There’s a lot of examples of “open” datasets that I didn’t use because they’re not actually open. The licenses are restrictive. Or the community has decided not to think about it. Perhaps we can help them understand why being a bit more open might be better?

There are also examples of openly licensed content that could be turned into more data. Take Wikia for example. It contains 360,000 wikis all with openly licensed content. They get 190m views a month and the system contains 43 million pages. About the same size as the English version of Wikipedia is currently. They’re all full of infoboxes that are crying out to be turned into structured data.

I think it’d be great to have all this fan produced data to a proper part of the open data commons, sitting alongside the government and organisational datasets that are being published.

Thank you (yes, you!)

That’s the end of my talk. I hope I’ve piqued your interest in looking at one or more of these projects in more detail. Hopefully there’s a project that will help you express your inner data geek.

Photo Attributions

Lego SpacemanEdwin AndradeJamie Street, Olu Elet, Aaron Burden, Volkan OlmezAlvaro SerranoRawPixel.com, Jordan WhitfieldAnthony DELANOIX

 

“Open”

For the purposes of having something to point to in future, here’s a list of different meanings of “open” that I’ve encountered.

XYZ is “open” because:

  • It’s on the web
  • It’s free to use
  • It’s published under an open licence
  • It’s published under a custom licence, which limits some types of use (usually commercial, often everything except personal)
  • It’s published under an open licence, but we’ve not checked too deeply in whether we can do that
  • It’s free to use, so long as you do so within our app or application
  • There’s a restricted/limited access free version
  • There’s documentation on how it works
  • It was (or is) being made in public, with equal participation by anyone
  • It was (or is) being made in public, lead by a consortium or group that has limitation on membership (even if just fees)
  • It was (or is) being made privately, but the results are then being made available publicly for you to use

I gather that at IODC “open washing” was a frequently referenced topic. It’s not surprising given the variety of ways in which the word “open” is used. Many of which are not open at all. And the list I’ve given above is hardly comprehensive. This is why the Open Definition is such an important reference. Even if it may have it’s faults.

Depending on your needs, any or all of those definitions might be fine. But “open” for you, may not be “open” for everyone. So let’s not lose sight of the goal and keep checking that we’re using that word correctly.

And, importantly, if we’re really making things open to make them better, then we might need to more open to collaboration. Open isn’t entirely about licensing either.

 

Building best practices for publish sector data

Originally published on the Open Data Institute blog. Original URL: https://theodi.org/blog/building-best-practices-for-sharing-public-sector-data

At the ODI we’re big fans of capturing best practices and simple design patterns to help guide people towards the most effective ways to publish data.

By breaking down complex technical and organisational challenges into smaller steps, we can identify common problems across sectors and begin cataloguing common solutions. This is the common thread that ties together our research and technicalprojects and it’s this experience that we bring to our advisory projects.

We’ve been contributing to the Share-PSI project, which has been documenting a range of best practices that relate to publishing public-sector data. Some of the best practices address specific technical questions relating to the web of data, and these form part of the W3C’s ‘Data on the web best practices’ guidance.

But some of the best practices address higher-level issues, such as the importance of creating an open data strategy and a release plan to support it. Or the creation of change by supporting startups and enabling ecosystems. Each best practice sets out the underlying challenge, a recommended solution, and provides pointers to further reading.

Our guidance, white papers and reports help to add depth to these best practices by linking them to evidence of their successful adoption, both here in the UK and internationally. This helps to ground the best practices in concrete guidance that draws on the experience of the wider community.

The best practices also provide a useful way to explore the elements of existing open data programmes.

For example, it’s possible to see how a large public-sector initiative like #OpenDefrahas been successful through its adoption of so many of these discrete best practices. These include the creation of a common open data strategy across its network, use of a release process that allowed for more rapid publication of data while managing risks, benchmarking practice using a maturity model, moving to an open by default licensing model, and its efforts to engage users and stimulate the wider ecosystem.

The best practices are a useful resource for anyone leading or contributing to an open data initiative. We’re looking forward to adding further to this body of evidence.

We’ve also begun to think about capturing common patterns that illustrate how open and shared data can be successfully used to deliver specific types of government policies. We are looking for feedback on this draft catalogue of strategic government interventions – you can either add comments in the document or email policy@theodi.org.)

How to open your data in six easy steps

Originally published on the Open Data Institute blog. Original URL: https://theodi.org/blog/how-to-open-your-data-in-six-easy-steps

1. Scope out the job at hand

Before taking the plunge and jumping straight into publishing, there are a few things to think through first. Take time to consider what data you’re going to release, what it contains and what the business case is for releasing it in the first place.

Consider what licence you’re going to put on the data for others to use. There’s a selection to choose from, depending on how you want others to use it, see our guidance here.

Here are some other key things to consider at this stage:

  • Where will it be published?
  • Will I need documentation around it?
  • What level of support is needed?
  • How frequently will I release the data?

2. Get prepared

Your data is only really useful to others if it’s well structured and has clear metadata (or a data description) to give it context and explain what it’s about and where it comes from.

Start your prep with a technical review using sample data, and identify suitable formats for release and the level of detail and metadata required. Also consider whether it’ll be most useful to the user as an API or a download. Data can be more useful when linked to other datasets, so keep an eye out for opportunities.

Consider your capabilities in-house and whether you need any training in order to release the data, whether technical or around certification. Some ODI courses can help with this.

Finally, think about what metadata you’re going to add to your data to describe what it is or how to use it.

3. Test your data

Before you release your data, you might want think about doing a preview with some of your potential users to get some detailed feedback. This isn’t necessarily required for smaller datasets, but for larger releases this user-testing can be really useful.

Don’t forget to get an Open Data Certificate to verify that your data is being published properly.

4. Release your data

Now for the exciting bit: releasing your data, the metadata and the documentation to go with it.

The key thing here is to release your data where your users will be. Otherwise, what’s the point? Where you should release it depends on who you are, but in general you should publish it on your own website, ensuring it’s also listed on relevant portals. For example, public sector organisations should add their data to data.gov.uk. Some sectors have their own portals – in science it’s the norm to publish in an institutional repository or a scientific data repository.

Basically, do your research into how your community shares data, and make sure it’s located in a place you have control over or where you’re confident the data can be consistently available.

When applying the Open Data Certificate, we’ll ask for evidence that the dataset is being listed in one or more portals to ensure it’s accessible.

5. Get engagement and promotion

It’s easy to relax after spending so much time and effort in preparing and releasing your dataset, but don’t just ‘fire and forget’. Make sure you have follow-up activities to let people know the data exists and be responsive to questions they might have. You can engage people in multiple ways (depending on your target audience), for example through blogs or social media. Encourage users to tell you how they’re using the data, so you can promote success stories around it too.

6. Reflect and improve

Now your dataset it out there in the big wide world, take some time to reflect on it. Listen to feedback, and decide what changes you could make or what you’d do differently next time.

If you want to measure your improvement, consider taking a maturity assessment using our Open Data Pathway tool.