I’ve found my new favourite example of a well documented, tiny slice of data infrastructure.
I’m going to hazard a guess that it’s probably the simplest dataset that is designated as national statistics. If you can think of one simpler, then let me know.
It’s the weekly road fuel prices data on gov.uk.
This data has been updated every week, without fail, since September 2013.
The CSV has got seven columns in it. You can download it in XML if you want. Or you can grab the Excel which comes with some fancy clipart of petrol pumps.
This is national statistics, so of course there’s a document that describes the methodology for how it is compiled. I’ve read it. At four pages, it’s short, clear and to the point.
There’s a bit more to it, but basically, every Monday someone at BEIS emails six companies and asks them for their prices. By the end of the day they put the responses in the spreadsheet and then it’s published on the Tuesday.
Someone or, more likely some team, at BEIS has been doing that for at least nine years. No one has bothered to automate it away. Probably because it’s not a lot of effort to keep updating the spreadsheet.
If we were designing this from scratch, we’d probably immediately start thinking about services and APIs and data formats. But none of that is really needed.
It just needed a spreadsheet and a commitment to keep publishing the data.
That’s what makes it data infrastructure. The commitment, not the technology.