Impressions from pidapalooza 19

This week I was at the third pidapalooza conference in Dublin. It’s a conference that is dedicated open identifiers: how to create them, steward them, drive adoption and promote their benefits.

Anyone who has spent any time reading this blog or following me on twitter will know that this is a topic close to my heart. Open identifiers are infrastructure.

I’ve separately written up the talk I gave on documenting identifiers to help drive adoption and spur the creation of additional services. I had lots of great nerdy discussions around URIs, identifier schemes, compact URIs, standards development and open data. But I wanted to briefly capture and share a few general impressions.

Firstly, while the conference topic is very much my thing, and the attendees were very much my people (including a number of ex-colleagues and collaborators), I was approaching the event from a very different perspective to the majority of other attendees.

Pidapalooza as a conference has been created by organisations from the scholarly publishing, research and archiving communities. Identifiers are a key part of how the integrity of the scholarly record is maintained over the long term. They’re essential to support archiving and access to a variety of research outputs, with data being a key growth area. Open access and open data were very much in evidence.

But I think I was one of only a few (perhaps the only?) attendee from what I’ll call the “broader” open data community. That wasn’t a complete surprise but I think the conference as a whole could benefit from a wider audience and set of participants.

If you’re working in and around open data, I’d encourage you to go to pidapalooza, submit some talk ideas and consider sponsoring. I think that would be beneficial for several reasons.

Firstly, in the pidapalooza community, the idea of data infrastructure is just a given. It was refreshing to be around a group of people that past the idea of thinking of data as infrastructure and were instead focusing on how to build, govern and drive adoption of that infrastructure. There’s a lot of lessons there that are more generally applicable.

For example I went to a fascinating talk about how EIDR, an identifier for movie and television assets, had helped to drive digital transformation in that sector. Persistent identifiers are critical to digital supply chains (Netflix, streaming services, etc). There are lessons here for other sectors around benefits of wider sharing of data.

I also attended a great talk by the Australian Research Data Commons that reviewed the ways in which they were engaging with their community to drive adoption and best practices for their data infrastructure. They have a programme of policy change, awareness raising, skills development, community building and culture change which could easily be replicated in other areas. It paralleled some of the activities that the Open Data Institute has carried out around its sector programmes like OpenActive.

The need for transparent governance and long-term sustainability were also frequent topics. As was the recognition that data infrastructure takes time to build. Technology is easy, its growing a community and building consensus around an approach that takes time.

(btw, I’d love to spend some time capturing some of the lessons learned by the research and publishing community, perhaps as a new entry to the series of data infrastructure papers that the ODI has previously written. If you’d like to collaborate with or sponsor the ODI to explore that, then drop me a line?)

Secondly, while the pidapalooza community seem to have generally accepted (with a few exceptions) the importance of web identifiers and open licensing of reference data. But that practice is still not widely adopted in other domains. Few of the identifiers I encounter in open government data, for example, are well documented, openly licensed or are supported by a range of APIs and services.

Finally, much of the focus of pidapalooza was on identifying research outputs and related objects: papers, conferences, organisations, datasets, researchers, etc. I didn’t see many discussions around the potential benefits and consequences of use of identifiers in research datasets. Again, this focus follows from the community around the conference.

But as the research, data science and machine-learning communities begin exploring new approaches to increase access to data, it will be increasingly important to explore the use of standard identifiers in that context. Identifiers have a clear role in helping to integrate data from different sources, but there are wider risks around data privacy and ethical considerations around identification of individuals, for example, that will need to happen.

I think we should be building a wider community of practice around use of identifiers in different contexts, and I think pidapalooza could become a great venue to do that.