In the open data community we tend to focus a lot on Publishers and Consumers.
Publishers have the data we want. We must lobby or convince them that publishing the data would be beneficial. And we need to educate them about licensing and how best to publish data. And we get frustrated when they don’t do those things
Consumers are doing the work to extract value from data. Publishers want to encourage Consumers to do things with their data. But are often worried that Consumers aren’t doing the right things or that they’re not able to track when that value is being created.
While these two roles clearly exist I’m increasingly of the opinion that the framing isn’t a helpful one. There are several reasons why I think that’s the case.
Firstly, as Jeni has already noted, it ties us in knots. By identifying ourselves with one or other role we create a divide. This inevitably leads us to focusing on our own perspective, our own needs, and sets expectations of what others must do or should do before we can act. And yet we know that:
- organisations are often publishers and consumers of their own data. Doing it better helps themselves and not just others
- to solve big, complex challenges we need to collaborate. Collaboration doesn’t happen when a team is divided and we don’t have shared goals
Secondly, I worry that by framing discussions in terms of Publisher and Consumer we are overlooking opportunities for more collaborative activities. Focusing on Publisher and Consumer leads us to think in narrow perspectives of what an open data infrastructure might look like. I’ve previously highlighted some of the differences between the current state of open source and open data.
I’d suggest that people in the “open source community” think of themselves as contributors, not “publishers of open source software” and “consumers of open source software”. See also, Open Street Map and other community led projects. Members of the community may tend to fall into specific roles but there’s a shared goal.
Thirdly, its just not representative of the current open data market, let alone what it might look like in future. There are already a number of different types of actors in the landscape. We should make more of an effort to recognise those roles and map out the value they add to the network.
I think it could also help us clarify a number of conversations that relate to “ownership” of data, particularly personal data and how it’s collected and reused.
So, as ground work for an exercise I’d like to do in mapping out an open data value network, here’s a proposed re-framing that recognises a number of different roles.
In each case it might be an individual or an organisation that’s fulfilling the role. And, in any given interaction, its possible that the same person or organisation might be fulfilling multiple roles.
This is far from polished, but thought I’d share an early draft. Add a comment to let me know what you think.
- Steward – has responsibility for managing and ensuring access to a dataset. Covers at least the infrastructure supporting the ongoing collection and access to the data. The Steward role could also include responsibilities for managing contributions, e.g. to ensure data quality.
- Examples: ORCID, Bath: Hacked, “Data Controller” role as defined by ICO
- Registrar – a specific type of Steward who is responsible for assigning and managing key identifiers and reference data used in other datasets
- Examples: CrossRef
- Contributor – responsible for adding, updating or curating data in a dataset, using the tools and infrastructure provided by the Steward.
- Examples: OSM editors, MusicBrainz contributors, Waze contributors, scholarly publishers adding to CrossRef
- Reuser – makes use of one or more open datasets to create applications, analysis, etc.
- Examples: data journalists, City Mapper, “Data Processor” role as defined by ICO
- Intermediary – provides value added services that wrap, host or enrich a dataset. E.g. visualisation tools, APIs, etc.
- Examples: Socrata, Data Press, Transport API
- Aggregator – a specific form of Intermediary that packages together datasets from other sources
- Examples: ESD toolkit Aggregator, Eurostat
- Beneficiary – benefits from the activity of reusers, e.g. by consuming packaged analyses or other applications
- Examples: City Mapper users, UK citizen
- Subject – a person or organisation who is the subject of a dataset or data item. E.g. the person contributing to a health dataset
While the Reuser role is actually the same as the “Consumer” role I referenced earlier, this framing breaks down Publisher into a number of smaller roles which hopefully better highlight some of the interactions that we tend to overlook. I’ve also tried to tease out some of the responsibilities of tool and platform vendors that help support the ecosystem.
If current data publishers began to think of themselves as Stewards of data, then would this let us have a better discussion about ways to enable more Contributors?
Can we make a case that open data publishing should be as lightweight as possible, to simplify the Stewards role, whilst enabling a marketplace of Intermediaries?
Can we better recognise the role of Registrars in creating the web of data?
Would separating out the needs of the Beneficiaries of data from those of Reusers help distinguish between technology user needs and broader needs around data literacy?
Let me know what you think. What else should go on this list?