What are Data Unions?

I’ve been doing some research around different types of data intermediary recently and thought I’d share some things I’ve learned about “Data Unions”.

Like a lot of the terms being applied to new approaches to data governance, there’s no clear definition of what constitutes a data union.

A vision of collective action

For example, this Mozilla review of different governance models suggests that a Data Union allows individuals to pool their data to support collective bargaining. There’s a comparison made to a Credit Union, in which people pool their finances for the collective benefit of their users.

Others have described a Data Union as similar to a labour union or a farmer’s cooperative.

What Credit Unions, Labour Unions and Cooperatives all have in common is that they are setup and run by a community, for its benefit. Whether that’s to unlock finances, scale and support their businesses or collectively bargain with employers (with the promise of withholding labour), its that collective approach which underpins them all.

This really interests me as a model. These collective approaches to data governance are sometimes described as “bottom-up data institutions“. That’s a term I dislike for a variety of reasons.

But I’ve not yet been able to find a single Data Union that actually fits that model. If you find one, let me know.

There are projects exploring collective approaches. But they would, I think, describe themselves as Data Cooperatives not Unions.

Data Unions in practice

The term Data Union now seems to be more closely associated with several Web3 platforms like Streamr and the, still in development, Pool Data.

Both startups are aiming to develop platforms to support the creation of Data Unions with the provision of infrastructure to support collecting data from individuals, aggregating and streaming that data to consumers, handling payments, etc.

The largest current Data Union run on those platforms is Swash, which I covered in depth in a previous post.

The broad model is:

  • A “Data Union Manager” setups up the Data Union, deciding what data will be collected and its financial terms, and then recruits users
  • Data Union Members” contribute data, by directly adding data about them, filling in surveys, importing data from other services, or by just interacting with an application or service provided by the Manager
  • Data Consumers purchase access to the stream of data provided by the Member, with the Data Union Manager setting the terms of pricing and revenue share with Members
  • Platforms like Streamr and Pool Data provide the technology that underpins all this, presumably taking a revenue cut

While this could all be run using a collective or collaborative approach, that’s not something that the platforms directly support: the goal is to sell access to those data streams, not use it to support the Members.

In some cases, it looks like users might be able to decide to sell or licence their data directly, but this seems to be a secondary use case. The goal is to build a large data stream that provides data about a large group of people.

Of the examples I’ve looked at so far, Data Unions seem to fall into one of two categories:

  • A new product or service has been setup with the explicit goal of becoming a data broker. The Data Union Manager (which is typically a startup) has identified a revenue generation opportunity if it can build a critical mass of personal data. Examples: Swash, Unbanx, DIMO, GeoGB. This is the prevailing model
  • An existing product or service has been updated to provide a stream of its existing user data to consumers. Example: MAT.

Neither approach seems to involve consultation or engagement with users beyond recruiting them to the Union with the promise of financial rewards if, or when, the Union is large enough to attract subscribers.

Neither approach is new. What is new is the specific technology or platforms that are being used.

An iteration not an evolution

Personally I would put Data Unions, at least in their current form, within the broader category of applications that offer people a passive income by taking surveys, offering reviews and watching ads.

They are more of an iteration of the current marketplace for data rather than an evolution or revolution. They are being marketed differently, but in terms of what is being offered to both members and data consumers, its very similar to existing “passive income” apps.

The variety, volume and detail of data that is collected is aimed at serving the needs of existing data consumers, not the members of the union.

They do offer guarantees around user consent for data sharing, but existing services will also be offering the same.

They offer financial incentives and other rewards, but so do other existing services that don’t describe themselves as Data Union.

Members will have better transparency than when they use apps that quietly aggregate and licence data. At least in the sense of knowing that data is being sold. Swash tells users that they need to look at transactions on the Ethereum network if they want to figure out where that data is actually going.

The real innovation is in the emerging technical platforms making it quicker to setup these new data brokers. So we might expect many more of them to spring up over the next few years. And they will need to compete on the quality, detail and depth of data they provide. Just like existing data brokers.

But whether they can scale enough to achieve their promise remains to be seen. At present there seem to be very few examples and those have, so far, had limited success.