At times I see quite a bit of debate within the open data community around how best to publish data. For example should data be made available in bulk or via an API? Which option is “best”? Depending on where you sit in the open data community you’re going to have very different responses to that question.
But I find that in the ensuing debate we often overlook that open data is intended to be used by anyone, for any purpose. And that means that maybe we need to think about more than just the immediate needs of developers and the open data community.
While the community has rightly focused on ensuring that data is machine-readable, so it can be used by developers, we mustn’t forget that data needs to be human-readable too. Otherwise we end up with critiques of what I consider to be fairly reasonable and much-needed guidance on structuring spreadsheets, and suggestions of alternatives that are well meaning but feel a little under-baked.
I feel that there are several different and inter-related viewpoints being expressed:
- That the citizen or user is the focus and we need to understand their needs and build services that support them. Here data tends to be a secondary concern and perhaps focused on transactional statistics on performance of those services, rather than the raw data
- That open data is not meant for mere mortals and that its primary audience is developers to analyse and present to users. The emphasis here is on provision of the raw data as rapidly as possible
- A variant of the above that emphasises delivery of data via an API to web and mobile developers allowing them to more rapidly deliver value. Here we see cases being made about the importance of platforms, infrastructure, and API programs
- That citizens want to engage with data and need tools to explore it. In this case we see arguments for on-line tools to explore and visualise data, or reasonable suggestions to simply publish data in spreadsheets as this is a format with which many, many people are comfortable
Of course all of these are correct, although their prominence around different types of data, application, etc varies wildly. Depending on where you sit in the open data value network your needs are going to be quite different.
It would be useful to map out the different roles of consumers, aggregators, intermediaries, etc to understand what value exchanges are taking place, as I think this would help highlight the value that each role brings to the ecosystem. But until then both consumers and publishers need to be mindful of potentially competing interests. In an ideal world publishers would serve every reuser need equally.
My advice is simple: publish for machines, but don’t forget the humans. All of the humans. Publish data with context that helps anyone – developers and the interested reader – properly understand the data. Ensure there is at least a human-readable summary or view of the data as well as more developer oriented bulk downloads. If you can get APIs “out of the box” with your portal, then invest the effort you would otherwise spend on preparing machine-readable data in providing more human-readable documentation and reports.
Our ambition should be to build an open data commons that is accessible and useful for as many people as possible.