Highlighting harms when writing design patterns

I enjoy writing design patterns. I find them a useful way to clarify my thinking around different solutions to problems across a whole range of areas. A well-named pattern can also help to clarify and focus discussion. I've written a whole book of Linked Data patterns and lead a team that produced a set of … Continue reading Highlighting harms when writing design patterns →

Improving access to non-domestic energy consumption data

I recently wrote a post describing the data ecosystem for non-domestic energy consumption data in the UK. In that post I summarised my current understanding of the different actors involved in that data ecosystem, and some of the challenges of trying to access data from the perspective of a third-party service provider. In an earlier … Continue reading Improving access to non-domestic energy consumption data →

How could watermarking AI help build trust?

I've been reading about different approaches to watermarking AI and the datasets used to train them. This seems to be an active area of research within the machine learning community. But, of the papers I've looked at so far, there hasn't been much discussion of how these techniques might be applied and what groundwork needs … Continue reading How could watermarking AI help build trust? →

24 different tabular formats for half-hourly energy data

A couple of months ago I wrote a post that provided some background on the data we use in Energy Sparks. The largest data source comes from gas and electricity meters (consumption) and solar panels (generation). While we're integrating with APIs that allow us to access data from smart meters, for the foreseeable future most … Continue reading 24 different tabular formats for half-hourly energy data →

Building data validators

This is a post about building tools to validate data. I wanted to share a few reflections based on helping to design and build a few different public and private tools, as well as my experience as a user. I like using data validators to check my homework. I've been using a few different recently … Continue reading Building data validators →

12 ways to improve the GDS guidance on reference data publishing

GDS have published some guidance about publishing reference data for reuse across government. I've had a read and it contains a good set of recommendations. But some of them could be clearer. And I feel like some important areas aren't covered. So I thought I'd write this post to capture my feedback. Like the original … Continue reading 12 ways to improve the GDS guidance on reference data publishing →

Brief review of revisions and corrections policies for official statistics

In my earlier post on the importance of tracking updates to datasets I noted that the UK Statistics Authority Code of Practice includes a requirement that publishers of official statistics must publish a policy that describes their approach to revisions and corrections. See 3.9 in T3: Orderly Release, which states: "Scheduled revisions or unscheduled corrections to … Continue reading Brief review of revisions and corrections policies for official statistics →

The importance of tracking dataset retractions and updates

There are lots of recent examples of researchers collecting and releasing datasets which end up raising serious ethical and legal concerns. The IBM facial recognition dataset being just one example that springs to mind. I read an interesting post exploring how facial recognition datasets are being widely used despite being taken down due to ethical … Continue reading The importance of tracking dataset retractions and updates →

A letter from the future about numbers

It's an odd now looking at early 21st century content in the Internet Archive. So little nuance. It feels a little like watching those old black and white movies. All that colour which was just right there. But now lost. Easy to imagine that life was just monochrome. Harder to imagine the richer colours. Or … Continue reading A letter from the future about numbers →

Four types of innovation around data

Vaughn Tan's The Uncertainty Mindset is one of the most fascinating books I've read this year. It's an exploration of how to build R&D teams drawing on lessons learned in high-end kitchens around the world. I love cooking and I'm interested in creative R&D and what makes high-performing teams work well. I'd strongly recommend it … Continue reading Four types of innovation around data →