The importance of tracking dataset retractions and updates

There are lots of recent examples of researchers collecting and releasing datasets which end up raising serious ethical and legal concerns. The IBM facial recognition dataset being just one example that springs to mind. I read an interesting post exploring how facial recognition datasets are being widely used despite being taken down due to ethical … Continue reading The importance of tracking dataset retractions and updates

Increasing inclusion around open standards for data

I read an interesting article this week by Ana Brandusescu, Michael Canares and Silvana Fumega. Called "Open data standards design behind closed doors?" it explores issues of inclusion and equity around the development of "open data standards" (which I'm reading as "open standards for data"). Ana, Michael and Silvana rightly highlight that standards development is … Continue reading Increasing inclusion around open standards for data

What kinds of data is it useful to include in a register?

Registers are useful lists of information. A register might be a list of countries, companies, or registered doctors. Or addresses. At the ODI we did a whole report on registers. It looks at different types of registers and how they're governed. And GDS built a whole infrastructure to support them being published and used across … Continue reading What kinds of data is it useful to include in a register?

Why is change discovery important for open data?

Change discovery is the process of identifying changes to a resource. For example, that a document has been updated. Or, in the case of a dataset, whether some part of the data has been amended, e.g. to add data, fill in missing values, or correct existing data. If we can identify that changes have been … Continue reading Why is change discovery important for open data?

How can publishing more data decrease the value of existing data?

Last month I wrote a post looking at how publishing new data might increase the value of existing data. I ended up listing seven different ways including things like improving validation, increasing coverage, supporting the ability to link together datasets, etc. But that post only looked at half of the issue. What about the opposite? … Continue reading How can publishing more data decrease the value of existing data?

Exploring registration agencies as data institutions

A key focus for our research and delivery work at the ODI at the moment is exploring how to design sustainable and trustworthy data institutions. Data institutions are organisations that steward data on behalf of a community. They have a variety of legal forms, roles and purposes. Yesterday I wrote (again!) about identifiers and specifically, … Continue reading Exploring registration agencies as data institutions

How do different communities create unique identifiers?

Identifiers are part of data infrastructure. They play an important role, helping to publish, structure and link together data. Identifiers are boundary objects, that cross communities. That means they need to be well-documented in order to be most useful. Understanding how identifiers are created, assigned and governed can help us think through how to strengthen … Continue reading How do different communities create unique identifiers?

How can publishing more data increase the value of existing data?

There's lots to love about the "Value of Data" report. Like the fantastic infographic on page 9. I'll wait while you go and check it out. Great, isn't it? My favourite part about the paper is that it's taught me a few terms that economists use, but which I hadn't heard before. Like "Incomplete contracts" … Continue reading How can publishing more data increase the value of existing data?