Discussion document: archiving open data

This is a brief post to highlight a short discussion document that I recently published about archiving open data.  The document is intended to help gather ideas, suggestions and best practices around archiving open data to the Internet Archive. The goal being to gather together useful guidance that can help encourage archiving and distribution of open data from existing portals, frameworks, etc.

This isn’t an attempt to build a new standard, just encourage some convergence and activity. At present the guidance recommends building around the Data Package specification as it is simple and provides a well-defined unit (a zip file) for archiving purposes.

Archiving data can help build resilience in the open data commons providing backups of important data resources. This will help deal with:

  • Unexpected system outages that could take down data portals
  • Decisions by publishers to remove data previously published under an open licence, ensuring an original copy remains
  • Decisions by publishers to take down data
  • Services and portals permanently going offline

If you have thoughts or suggestions then feel free to add them to the document. It would particularly benefit from input from those in the archival community and especially those who are already familiar with working with the Internet Archive.

I hope to build a small reference implementation to illustrate the idea and help to archive the data from Bath: Hacked.