Assessing data infrastructure: the Principles of Open Scholarly Infrastructure

How do we create well-designed, trustworthy, sustainable data infrastructure and institutions?

This is a question that I remain deeply interested in. Much of the freelance work I’ve been doing since leaving the ODI has been in that area. For example, I’m currently helping with a multi-year evaluation of an grant-funded data institution.

I’m particularly interested in frameworks and methodologies for assessing infrastructure and institutions. With a view towards helping them become more open, more trustworthy and more sustainable.

This is the first in a series of blog posts looking at some of this existing work.

What are the Principles of Open Scholarly Infrastructure?

The Principles of Open Scholarly Infrastructure (POSI) consist of 16 principles grouped into three themes: governance, sustainability and insurance.

The seven Governance principles touch on how the infrastructure will be governed and managed. These highlight the need for, e.g. stakeholder led governance, transparency, and the need to plan across the entire lifecycle of the infrastructure, including its wind-down.

The five Sustainability principles highlight the need to for revenue generation to align with the mission of the organisation, and emphases generating revenue from services rather than data. They also highlight the need to generate a surplus and finding long-term sources of revenue, rather than relying on grant funding.

The five Insurance principles centre openness: open source and open data, as well as IP issues. In short, ensuring that if the infrastructure fails (or loses the trust of its community) its core assets can be reused.

How was it developed?

The principles were first presented in a 2015 blog post. The principles attempted to codify a set of rules and norms that was already informing the operations of CrossRef, this was prompted by an growing distrust in infrastructure services by the scholarly community.

Reliance on time-limited grant funding was impacting sustainability and reliability of services, alongside growing concerns over commercial ownership of key services.

Since then the principles have been discussed and adopted by a number of others.

What type of infrastructure and institutions does it focus on?

POSI is intended to help guide the development and operations of infrastructure that supports any kind of scholarly activities (e.g. both research and teaching) across a range of domains (e.g. both the sciences and humanities).

This includes infrastructure services that support the management and publication of research data and metadata, scholarly research archives, identifier schemes, etc.

The FAQ highlights that the principles were also intended to help support procurement and comparison of different services.

Could the principles be adopted in other contexts?

Any set of principles will reflect the priorities of community that produced them. Care should be taken before blindly applying principles from one context to another. Some issues might be foregrounded that are less important. While other important concerns might be appropriately centred.

See my post on the FAIR data principles for more on that.

However, I think much of POSI is applicable to all types of open infrastructure. Good governance and sustainability is important in any context. Open data and open source also play a fundamental role.

However there are some elements that might not apply in all contexts, or might be presented differently. And others which might be missing. For example:

  • The governance principle “coverage across the research enterprise” is clearly domain specific. Although there may be a broader formulation that focuses on governance that respects the entire ecosystem in which the infrastructure exists
  • The principles state that the infrastructure “cannot lobby” — I think in some contexts organisations operating infrastructure services might want to, or already see themselves as involved in, driving regulatory change? E.g. to help to encourage strong data protection laws, or create greater transparency within a sector
  • The foregrounding of limited use of time-limited funds clearly reflects the origins of the principles in a community that often relies on grant funding. This may be less of an issue in other contexts
  • As an insurance policy, open source and open data allows systems to be forked or cloned. But open source and open data might underpin other areas of how an infrastructure operates, e.g. security or collaboration. So they might fulfil a different role in other communities
  • There are no principles that directly foreground issues of privacy or ethics — infrastructure services in other sectors might reasonably want clearer statements on privacy, inclusion and responsible use of data

The POSI FAQ is also worth reading as it includes a number of clarifications about the scope and intent behind some of the principles.

In short, I think it would be useful to compare POSI with approaches originating in other sectors, in order to identify common themes. Within the research space, IOI is also exploring ways to assess and compare infrastructure.

How have organisations adopted the principles?

A range of organisations have adopted the principles, most recently Europe PMC.

No organisation meets all of the principles.

But the intention isn’t that an organisation should comply with all of them before doing an assessment. An assessment is intended to prompt reflection and development of a plan for improvement.

There is an assumption that organisations will regularly reassess themselves against the principles.

All of the existing assessments have taken the same broad approach, replicating that used by CrossRef:

  • a public statement that the organisation has decided to adopt the principles
  • a high-level self-assessment against the principles, using a Red/Amber/Green (RAG) rating for each item on the list
  • for each principle, a more detailed discussion of how the organisation has implemented that principle or has plans to do so
  • links to relevant evidence, e.g. public policies or governance documents that back up the assessment

I’ve produced a public spreadsheet listing the current RAG ratings for each organisation.

To help with future assessments, I think there’s scope to:

  • produce a common openly licensed template that can be used to publish the self-assessment ratings and capture relevant links
  • some additional guidance about how to assess and interpret each principles
  • suggestions for the types of evidence that could be referenced, or published to help to make the assessments verifiable

One thought on “Assessing data infrastructure: the Principles of Open Scholarly Infrastructure

Comments are closed.