Do data scientists spend 80% of their time cleaning data? Turns out, no?

It's hard to read an article about data science or really anything that involves creating something useful from data these days without tripping over this factoid, or some variant of it: Data scientists spend 80% of their time cleaning data rather than creating insights. Or Data scientists only spend 20% of their time creating insights, … Continue reading Do data scientists spend 80% of their time cleaning data? Turns out, no?

[Paper Review] The Coerciveness of the Primary Key: Infrastructure Problems in Human Services Work

This blog post is a quick review and notes relating to a research paper called: The Coerciveness of the Primary Key: Infrastructure Problems in Human Services Work (PDF available here) It's part of my new research notebook to help me collect and share notes on research papers and reports. Brief summary This paper explores the … Continue reading [Paper Review] The Coerciveness of the Primary Key: Infrastructure Problems in Human Services Work

How do data publishing choices shape data ecosystems?

This is the latest in a series of posts in which I explore some basic questions about data. In our work at the ODI we have often been asked for advice about how best to publish data. When giving trying to give helpful advice, one thing I'm always mindful of is how the decisions about … Continue reading How do data publishing choices shape data ecosystems?

How can we describe different types of dataset? Ten dataset archetypes

As a community, when we are discussing recommendations and best practices for how data should be published and governed, there is a natural tendency for people to focus on the types of data they are most familiar with working with. This leads to suggestions that every dataset should have an API, for example. Or that … Continue reading How can we describe different types of dataset? Ten dataset archetypes

Thinking about the governance of data

I find "governance" to be a tricky word. Particularly when we're talking about the governance of data. For example, I've experienced conversations with people from a public policy background and people with a background in data management, where its clear that there are different perspectives. From a policy perspective, governance of data could be described … Continue reading Thinking about the governance of data

Creating better checklists, a short review of the Checklist Manifesto

I've just finished reading The Checklist Manifesto by Atul Gawande (Cancer Research UK affiliate link). It's been on my reading list for a while. In my work I've written quite a few checklists to help capture best practice or to provide advice. So I was curious about whether I could learn something about creating better checklists. I … Continue reading Creating better checklists, a short review of the Checklist Manifesto