The FAIR Principles continue to gain traction across a range of sectors as a means to encourage the publication of good, machine-readable data.
I’ve written before about the limits of the FAIR principles and the importance of understanding the context for which they were designed. I’m not going to rehash that discussion but I did want to highlight the problems with defining these types of data publishing recommendations as “principles”.
Principles only ever provide general best practices. E.g. to use persistent identifiers or publish metadata to accompany a dataset. But they typically don’t help with the concrete decision making required to actually collect, structure or share some data. Which identifiers should I use for which types of resource? What types of metadata, in what formats?
I’ve been involved with several projects over the last few years which have tried to provide advice and support to projects and organisations in implementing the FAIR principles. In every case, there is always a need to provide additional guidance: what does it mean to implement FAIR in the context of a specific data ecosystem?
The goal of FAIR is not just about creating good machine-readable data. It’s also to encourage data to be published in consistent ways. To create convergence across communities of data publishers and users.
Without thinking more broadly about the ecosystem of use or any external guidance, reasonable people can make different decisions about how to apply the principles leading to data that is still not easy to reuse.
What I think has been lacking to date, has been that broader coordination around documenting what FAIR looks like for specific types of datasets, within specific data ecosystems. Recently though, I’ve noticed a number of efforts to start to document more specific frameworks.
To pick a few at random:
- A framework for FAIR robotic datasets
- Enabling FAIR data in Earth and environmental science with community-centric (meta)data reporting formats
- A proposed FAIR approach for disseminating geospatial information system maps
Mapping the generic FAIR principles to concrete decisions around which identifier schemes, data formats, and other standards to use has been referred to as a “FAIR Implementation Profile“. Basically, recommendations for how to combine different standards to publish good data.
Clear guidance that helps to reinforce ecosystem norms around data access, will avoid the need for each project or organisation to learn the general FAIR principles and then figure out how to apply them.
Implementation Profiles can speed adoption of common standards, will make it easier to validate and check whether datasets do conform to those profiles, and help creation of additional tooling that will support management and use of data.
They are a means of increasing consistency around how data is accessed, used and shared that spans a wide range of use cases.
However, I think there’s still work to be done to better define what a useful implementation profile looks like.
The GO FAIR project provides a simple spreadsheet template which I don’t think is fit for purpose. Any real-world answers to those questions will need some guidance, discussion and examples. It’s not just filling in a box in a spreadsheet.
If you’re running a project that needs to produce FAIR data as an output; providing guidance to organisations around FAIR adoption; or trying to improve how data is published within a specific data ecosystem I’d encourage you to reflect on how drafting a profile would help achieve those goals.
Then make them open so they can be iterated on and improved by others.