Wonky fruit and data standards

There was an article in the Guardian this week about supermarkets selling “wonky” fruit during the current drought.

It got me thinking about how our perceptions of what is good, normal or acceptable get shaped by decisions made by others. This lead me down a bit of a rabbit-hole reading about cosmetic standards for fruit and vegetables.

This article by Oddbox suggests that EU guidelines and regulations, such as this 2011 marketing standard for apples, started to shape what was considered to be acceptable fruit and vegetables. But, while these have been relaxed, supermarkets have continued to enforce cosmetic standards on their suppliers.

This article from University of British Columbia discusses how cosmetic standards contributes significantly to food wastage.

Standards that were intended to improve safety have ended up:

  • shaping our expectations of what a normal piece of fruit or vegetable looks like
  • increased waste as product that doesn’t fit the standard is often thrown away
  • reduced choice as many variants (e.g. heritage apples?) just don’t fit the standards
  • and (my assumption) potentially impacted biodiversity by encouraging producers to grow only what fits the standard

Embracing wonky veg is a tiny part of adapting to climate change, but perceptions are hard to change.

I think this is a good illustration of how a standard brought in to achieve a positive goal can have unexpected side-effects and harms over the longer term.

“Wonky” data

I inevitably started thinking about these types of standard, in the context of data.

Now, I don’t think I have anything particularly new to offer here. Susan Leigh Star is the go to source for insights on the problems of standards and classifications. And Jeni has already explored what we can learn about the regulation of data from the regulation of food (see her talk here).

But, I like using this type of metaphor to encourage some reflection. So…

Standards for data are generally intended to increase data quality, making it easier to access, use and share.

But standards are, by design, intended to create conformity. But conformity means leaving behind diversity. Depending on the standard, that might be harmful.

Standards and restrictive data models can erase lives. This fantastic paper looks at how standardising health records impacts anyone whose sex/gender does not fit the “norm”.

Discarding the data that does not fit, we end up shaping our perceptions of the things described by or measured by that data.

How do we keep track of what is lost, so that we might recover or reuse it? Or choose to revise our notion of the “norm”?

By standardising data we encourage the production of more data that can be made to fit that standard.

But this may lead to limiting the breadth of sources we use in our analysis. How do we keep track of, and draw insights from a broader range of data sets to avoid limiting our perspective?

By standardising data we also shape the technical ecosystems that rely on them. But as a result we might also limit our ability to quickly adapt to new conditions or insights.

How do we design those standards and systems to adapt? How do we make their design assumptions and biases clearer?

I think these types of questions are particularly important at a point in time where we are rapidly training AI/ML models on data which has been shaped by standards designed many years ago.