How can publishing more data increase the value of existing data?

There’s lots to love about the “Value of Data” report. Like the fantastic infographic on page 9. I’ll wait while you go and check it out.

Great, isn’t it?

My favourite part about the paper is that it’s taught me a few terms that economists use, but which I hadn’t heard before. Like “Incomplete contracts” which is the uncertainty about how people will behave because of ambiguity in norms, regulations, licensing or other rules. Finally, a name to put to my repeated gripes about licensing!

But it’s the term “option value” that I’ve been mulling over for the last few days. Option value is a measure of our willingness to pay for something even though we’re not currently using it. Data has a large option value, because its hard to predict how its value might change in future.

Organisations continue to keep data because of its potential future uses. I’ve written before about data as stored potential.

The report notes that the value of a dataset can change because we might be able to apply new technologies to it. Or think of new questions to ask of it. Or, and this is the interesting part, because we acquire new data that might impact its value.

So, how does increasing access to one dataset affect the value of other datasets?

Moving data along the data spectrum means that increasingly more people will have access to it. That means it can be used by more people, potentially in very different ways than you might expect. Applying Joy’s Law then we might expect some interesting, innovative or just unanticipated uses. (See also: everyone loves a laser.)

But more people using the same data is just extracting additional value from that single dataset. It’s not directly impacting the value of other dataset.

To do that we need to use that in some specific ways. So far I’ve come up with seven ways that new data can change the value of existing data.

Comparison. If we have two or more datasets then we can compare them. That will allow us to identify differences, look for similarities, or find correlations. New data can help us discover insights that aren’t otherwise apparent.
Enrichment. New data can enrich an existing data by adding new information. It gives us context that we didn’t have access to before, unlocking further uses
Validation. New data can help us identify and correct errors in existing data.
Linking. A new dataset might help us to merge some existing dataset, allowing us to analyse them in new ways. The new dataset acts like a missing piece in a jigsaw puzzle.
Scaffolding. A new dataset can help us to organise other data. It might also help us collect new data.
Improve Coverage. Adding more data, of the same type, into an existing pool can help us create a larger, aggregated dataset. We end up with a more complete dataset, which opens up more uses. The combined dataset might have a a better spatial or temporal coverage, be less biased or capture more of the world we want to analyse
Increase Confidence. If the new data measures something we’ve already recorded, then the repeated measurements can help us to be more confident about the quality of our existing data and analyses. For example, we might pool sensor readings about the weather from multiple weather stations in the same area. Or perform a meta-analysis of a scientific study.

I don’t think this is exhaustive, but it was a useful thought experiment.

A while ago, I outlined ten dataset archetypes. It’s interesting to see how these align with the above uses:

A meta-analysis to increase confidence will draw on multiple studies
Combining sensor feeds can also help us increase confidence in our observations of the world
A register can help us with linking or scaffolding datasets. They can also be used to support validation.
Pooling together multiple descriptions or personal records can help us create a database that has improved coverage for a specific application
A social graph is often used as scaffolding for other datasets

What would you add to my list of ways in which new data improves the value of existing data? What did I miss?

Share this:

Published by Leigh Dodds