Data published to the web should always be accompanied by machine-readable metadata describing all aspects of the dataset including is content, origin, publication schedule and, importantly, clear licensing. A clear statement of re-use rights can ensure that consumers fully understand both how a dataset can be re-used and any obligations that they may incur through that usage.
Through the Open Data Certificates, the ODI will be encouraging data publishers to publish machine-readable metadata. In fact doing so is key criteria for reaching each level of certification.
There are several existing efforts to help standardise dataset metadata, including the W3C Data Catalog Vocabulary(DCAT). Publishers should use these standards to help describe their data.
However in a few cases there are gaps in existing standards that merits further work. This is particularly true in the case of publishing machine-readable rights information.
At the ODI we are currently working on a new vocabulary to support the publication of “Open Data Rights Statements”. The vocabulary builds upon and extendsthe Dublin Core and Creative Commons vocabularies to support the description of Rights Statements that may include:
- A reference to a license for the dataset
- A reference to a content license that applies to copyrightable parts of a dataset — an important piece of metadata in juristications that recognise database rights
- Copyright notices
- Attribution metadata to support re-users in acknowledging their sources
- How to publish machine-readable rights statements as RDFa, Linked Data, or as additions to existing JSON or XML formats
- How to link to rights statements from both web pages and APIs
- Guidance for developers on how to apply this metadata to build attribution and citation links
Once complete these guides will form part of a broader set of guidance from the ODI on how to publish standards-compliant dataset metadata.