This is another in a very occasional series of blog posts where I look at different data initiatives, institutions or infrastructure in order to understand a bit more about how they work. And then have opinions about them.
Previously I wrote about Common Voice. This time I’m looking at Swash which describes itself as “reimagining data ownership”.
What is Swash?
The website says that “Swash’s decentralised data ecosystem redefines how data is collected and valued by providing a simple way to passively earn, innovate, and create for a fairer world and better, more equitable internet.”
The system enables the creation of “new realities of data ownership and value creation through new incentivisation streams, innovative data monetisation mechanisms, and a collaborative development framework“. It is, somewhat inevitably, built around “Web 3”.
They claim to be the world’s first “Data Union” and recognised by the ODI as “a prime example of a ‘bottom-up’ data institution”.
If we set aside the buzzwords, grand vision and marketing, what are they actually doing?
It’s quite simple: you give them permission to collect data from you as you browse the web. This is aggregated and then sold. One data product is accessed via the Streamr data marketplace, another is a business intelligence tool. As a user you then get a share in that revenue. Paid in tokens, obviously.
There’s also a platform angle, of course. Other developers can build around their framework to analyse and sell datasets, or offer additional services to Swash users.
But the core business model is selling a marketplace intelligence product.
How is data accessed, used and shared?
The data collection process involves:
- A users signing up to the service and installing a browser extension. The extension is given permission to monitor what the user is doing including in private browsing sessions if they choose
- The extension has a number of different modules and data collection functions. These are triggered by making web requests, by content in the browser, and via APIs exposed within the applications the user is interacting with. So, for example, it can record every web page you visit or which music video you liked on YouTube. Or what search results you were served when you were shopping on Amazon
- This stream of events is anonymised and submitted to Swash. where it is processed within their data fabric. See their technical paper, page 14
- The event data is available as a “raw” data product. But other pre-analysed/aggregated datasets are offered to users
- The data is sold via Streamr or other marketplaces. Smart contracts providing the usual magical pixie dust on top.
There’s an application and marketplace layer that I think is fairly nascent. But this will provide options for collecting or data or generating other revenue streams in the future.
Interestingly, its the packaged data product that Swash refers to as the “Data Union”. The Data Union concept seems to originate from various Web3 initiatives and pundits, e.g. “Data is Labor“.
I’d originally thought that it might imply some collective action or participatory stewardship, but I can’t see any evidence of that at all.
Swash make it clear that: “WE ARE NOT RESPONSIBLE FOR ANY MISUSE OF DATA PROVIDED TO THE BUYERS”.
So their stewardship consisting of the data engineering required to collect, manage, aggregate and sell the data but nothing beyond that. No assurance or monitoring of downstream uses on data on behalf of their users.
What data is actually collected?
For a system that has transparency as a design principle it took me quite a while to actually track down what data is being collected.
- Shopping, e.g. what you searched for on Amazon and Ebay, what you added to your cart, wished for, searched for, etc
- Beauty and fashion data collectors, similar to the above, but for sites like Sephora and Bath & Body Works
- Social media collectors, e.g. Facebook and Twitter. Again, it collects your searches, results listings and clicks, along with what you liked, who you followed, what you posted, etc
- Search engine collectors, e.g. what you searched for, what results you were served with on Google and other search engines, including DuckDuckGo(!)
- News collectors, so that it can collect what you’re reading on the BBC and CNN websites
- As well as a generic surfing collector which records every website you’ve visited
If you want a detailed breakdown of which sites and what data points are collected then you will have to read the source code. There’s some detail in there that isn’t in the summary. For example that they collect the prices you’re offered when doing online shopping.
What controls do users have?
Your options are that:
- You can enable/disable the extension to opt in/out of data collection. It looks like you might also be able to disable specific modules (e.g. no social data) or specific collectors (e.g. no Twitter data)
- You can delay sending data so you can review and cancel it
- You can configure keywords and text that will be filtered from the collected data, e.g. your name or phone number
- You can configure different privacy levels which impacts how data is reported e.g. only the domain of the website you’re browsing, or the full path.
- You can configure different levels of anonymity. E.g. are you happy with all data, from all sites being linked to a single identifier? Or would you prefer that a random identifier is assigned to each event?
Within the context of what Swash is doing, which is obviously pretty detailed monitoring of your browsing, these seem to offer a good degree of flexibility. And the different modules seem to have different default settings based on potential sensitivity of the data.
This seems pretty good.
What you can’t do is:
- Decide what websites data will be collected from. Swash decides what the market is interested in and then adds or updates modules accordingly. You have no real individual or collective control over what data will be collected other than opting out
- Decide who gets to use the data. The FAQ says that you could look on the Ethereum network and try to trace down a transaction to its source address. But notes you won’t necessarily know who is using the data.
Having dug through all of this I’m left thinking that really there’s nothing new here. Clickstream data about people’s searching and spending habits has been a valuable dataset for sometime. And there are a number of sources of that data, including other services that operate very similar to Swash:
- Nielsen have an app you can install on your phone or computer that collects similar data, to provide insights to industry and offer you rewards
- Surfe.be has a browser extension that you can install that will monitor your surfing, and pay you for doing polls, watching ads, etc. Swash’s vision for its platforms of additional apps includes exactly the same features. There are an
- Upvoice asks to monitor your social feeds
- Solipay seems to do exactly what Swash does, but pays you actual cash not tokens. Interestingly they seem to give you less control over what is collected but you can decide who accesses the data
- Mozilla Rally is a browser extension that lets you opt-in to research projects that will have access to your data.
…and there are lots of other services in the “passive income” category that also collection of your browsing data, completing surveys, etc.
Ignoring the technical detail Swash doesn’t really stand out from the crowd here. They do use language and a framing for its services that aligns it with the current movement to explore alternative approaches to data governance, participatory stewardship, etc. This might explain why they’ve ended up being highlighted in reports and blog posts from the ODI, Aapti and (just today) GPAI.
I don’t think we’d argue that Nielsen is a bottom-up data institution offering delegated decision making and yet they have essentially the same business model. The only choice you’re making is whether you want to be paid to help play a small part in improving some anonymous organisation’s ad spend.
Swash is not really giving users better control over their data. It’s just incentivising them to agree for more of it to be collected. It’s about acquiring users not empowering them. A valid outcome would be for this data to not be collected at all.
When we’re focusing on debating which sub-category of data institution different services fit within, we risk overlooking whether they are really doing anything distinctly different. Are they doing something new or just rebadging existing practices?
To do that we need to move beyond comparative analyses and understand these services within the context of their data ecosystems.