Beyond publishers and consumers

In the open data community we tend to focus a lot on Publishers and Consumers.

Publishers have the data we want. We must lobby or convince them that publishing the data would be beneficial. And we need to educate them about licensing and how best to publish data. And we get frustrated when they don’t do those things

Consumers are doing the work to extract value from data. Publishers want to encourage Consumers to do things with their data. But are often worried that Consumers aren’t doing the right things or that they’re not able to track when that value is being created.

While these two roles clearly exist I’m increasingly of the opinion that the framing isn’t a helpful one. There are several reasons why I think that’s the case.

Firstly, as Jeni has already noted, it ties us in knots. By identifying ourselves with one or other role we create a divide. This inevitably leads us to focusing on our own perspective, our own needs, and sets expectations of what others must do or should do before we can act. And yet we know that:

  • organisations are often publishers and consumers of their own data. Doing it better helps themselves and not just others
  • to solve big, complex challenges we need to collaborate. Collaboration doesn’t happen when a team is divided and we don’t have shared goals

Secondly, I worry that by framing discussions in terms of Publisher and Consumer we are overlooking opportunities for more collaborative activities. Focusing on Publisher and Consumer leads us to think in narrow perspectives of what an open data infrastructure might look like. I’ve previously highlighted some of the differences between the current state of open source and open data.

I’d suggest that people in the “open source community” think of themselves as contributors, not “publishers of open source software” and “consumers of open source software”.  See also, Open Street Map and other community led projects. Members of the community may tend to fall into specific roles but there’s a shared goal.

Thirdly, its just not representative of the current open data market, let alone what it might look like in future. There are already a number of different types of actors in the landscape. We should make more of an effort to recognise those roles and map out the value they add to the network.

I think it could also help us clarify a number of conversations that relate to “ownership” of data, particularly personal data and how it’s collected and reused.

So, as ground work for an exercise I’d like to do in mapping out an open data value network, here’s a proposed re-framing that recognises a number of different roles.

In each case it might be an individual or an organisation that’s fulfilling the role. And, in any given interaction, its possible that the same person or organisation might be fulfilling multiple roles.

This is far from polished, but thought I’d share an early draft. Add a comment to let me know what you think.

  • Steward – has responsibility for managing and ensuring access to a dataset. Covers at least the infrastructure supporting the ongoing collection and access to the data. The Steward role could also include responsibilities for managing contributions, e.g. to ensure data quality.
    • Examples: ORCID, Bath: Hacked, “Data Controller” role as defined by ICO
  • Registrar – a specific type of Steward who is responsible for assigning and managing key identifiers and reference data used in other datasets
    • Examples: CrossRef
  • Contributor – responsible for adding, updating or curating data in a dataset, using the tools and infrastructure provided by the Steward.
    • Examples: OSM editors, MusicBrainz contributors, Waze contributors, scholarly publishers adding to CrossRef
  • Reuser – makes use of one or more open datasets to create applications, analysis, etc.
    • Examples: data journalists, City Mapper, “Data Processor” role as defined by ICO
  • Intermediary – provides value added services that wrap, host or enrich a dataset. E.g. visualisation tools, APIs, etc.
    • Examples: Socrata, Data Press, Transport API
  • Aggregator – a specific form of Intermediary that packages together datasets from other sources
  • Beneficiary – benefits from the activity of reusers, e.g. by consuming packaged analyses or other applications
    • Examples: City Mapper users, UK citizen
  • Subject – a person or organisation who is the subject of a dataset or data item. E.g. the person contributing to a health dataset

While the Reuser role is actually the same as the “Consumer” role I referenced earlier, this framing breaks down Publisher into a number of smaller roles which hopefully better highlight some of the interactions that we tend to overlook. I’ve also tried to tease out some of the responsibilities of tool and platform vendors that help support the ecosystem.

If current data publishers began to think of themselves as Stewards of data, then would this let us have a better discussion about ways to enable more Contributors?

Can we make a case that open data publishing should be as lightweight as possible, to simplify the Stewards role, whilst enabling a marketplace of Intermediaries?

Can we better recognise the role of Registrars in creating the web of data?

Would separating out the needs of the Beneficiaries of data from those of Reusers help distinguish between technology user needs and broader needs around data literacy?

Let me know what you think. What else should go on this list?



7 thoughts on “Beyond publishers and consumers

  1. You suggest registrar as a kind of steward. Could the roles be separate? E.g. IANA manages Id assignments that are used in Internet protocols? (I know that’s not a data scenario, but I couldn’t think of another.

  2. Nice list. Two other roles come to my mind:

    1. Regulator: responsible for ensuring that everyone in the community is sticking to regulation. e.g. the ICO play an important role in the UK’s data infrastructure. They set some of the rules of the road. There are obviously regulators other than those for data protection, for example an energy regulator might ask for open data on price plans.

    2. Policymaker: responsible for creating a set of principles/measures to generate desired outcomes. e.g. a local council official or a central government chief data officer. Tend to have access to funding and can help align incentives across multiple stakeholders but are not independent. They might recommend legislation and regulation with legislators making final decisions.

    You could even go as far as legislator, they have a different role and motives to the above two and set the general direction of travel/desired outcomes subject to… etc.

  3. Hi Leigh, at our ‘Data-driven decisions’ conference on Thursday there was some good discussion about this ‘data supply chain’ idea and the different roles required to get the most value out of data. Jamie Whyte and Steve Peters both presented some examples on selecting, organising and visualising open data to present it to policy makers or senior management who would use it to support their decision making.

    In both of those examples (and others like it) there was an important role combining elements of domain expert, analyst, visualisation creator and story teller – to find and select data relevant to the question at hand and present it in a way that was suitable for the end user. In Jamie’s case, his Trafford Innovation Lab team took this role. For the DCLG example of a housing data dashboard, it was a combination of the DCLG ‘Implementation Unit’ as domain experts, DCLG’s ‘Open Data Communities’ team in preparing and organising relevant input data, and Swirrl coding up a nice presentation of it.

    Having the data in a well-organised form, presented through a suitable software platform, preferably with APIs and various download options certainly helps reduce friction in this process, but it still needs someone to ‘tell the story’ with the data. That probably most closely fits the ‘Reuser’ role from your list above, but just to comment that this appears to be a very important role, and I think some of the obstacles around exploitation of open data are related to the lack of people in the right place with the right skills to do this job. The public sector could certainly do with training or recruiting more people like that.

    I think at Swirrl we would see ourselves fitting your ‘Intermediary’ role most closely – an important task for organisations like us is to ensure our software platforms make life easy for the story tellers/re-users, as described above.



    1. Thanks for the feedback. Absolutely agree about the importance of the Reuser role, especially to help contextualise data to make it useful for others. Also useful to see that you feel there’s a clear fit for what Swirrrl does within the taxonomy. Suggests I’m on the right track 🙂

  4. I wonder if there is another role of “Rights Holder”?

    The Rights Holder is the person or organization that owns the legal rights to the data.

    We use the term “Rights Holder” to clarify terms such as “Data Author” and “Data Owner” which is sometimes used to imply the person or organisation that holds the rights for the data. E.g.

    – Sometimes data authors may have given the copyright for their work to their employers automatically.

    – Sometimes the data owner may own something but may have exclusively licensed it to the rights holder.

    Perhaps you are implying that the Data Steward is the Rights Holder?

    1. There’s certainly a rights holder, but I’m not sure yet whether its a defined role here. If it is, then we should be able to define some tangible/intangible value that is being exchanged between the rights holder and some of the other roles.

      However, I tend to think that the rights will end up being with either the Steward or the Contributor, depending on the type of system/project/platform we’re looking at. In my view it’s the Steward who is empowered to make the decision to publish the data, which means they are likely to be the rights holder.

Comments are closed.