The Data Package Manager is an Open Knowledge Foundation project to create a tool to support discovery and distribution of datasets. The tool uses the concept of a “data package” to describe the basic metadata for a dataset plus the supporting files. Packages are indexed in a registry to make them searchable and to support distribution. The dpm tool works with the CKAN data portal software, using its API to search and download data packages.
The dpm documentation includes guidance on how to install and use the software. Once the basic software is installed you run:
dpm setup config
This will create a default configuration file called
.dpmrc in your home directory. This configuration works with The Data Hub allowing you to access its registry of over 5000 datasets. For example there’s a basic RDF/XML version of the British National Bibliography, if we wanted to automatically download the files associated with that package then we can run the following command:
dpm download ckan://bluk-bnb-basic bnb-basic
The second parameter is an identifier for the dataset, note that
bluk-bnb-basic is the same as the id used in the URL of the dataset on the Data Hub. This makes it easy to script up downloads of a dataset if the publisher has gone to the trouble of associating the files with their CKAN package.
The data.gov.uk website has been built using CKAN. The API endpoint can be found at: http://data.gov.uk/api/. This means that we can use dpm to interact with data.gov.uk too, all we need to do is specify that dpm should use a different registry.
To get dpm to use a different CKAN instance we need to modify its config:
- Take a copy of
~/.dpmrcand put it somewhere handy, e.g.
- Edit the ckan.url entry and change it to http://data.gov.uk/api/
- When you run dpm use the
-cparameter to specify that it should use the alternate config
Here’s a gist that shows an example of the edited config. Its best to just modify a copy of the default version as there are other paths in there that should remain unchanged.
Here are some examples of using dpm with data.gov.uk. Make sure the config parameter points to the location of your revised configuration file:
Search data.gov.uk for packages with the keyword “spending”:
dpm --config datagovuk.ini search ckan:// spending
Get a summary of a package:
dpm --config datagovuk.ini info ckan://warwickshire-spending-allocation
Download the files associated with a package to a local data directory. The tool will automatically create sub-directories for the package:
dpm --config datagovuk.ini download ckan://warwickshire-spending-allocation data
The latter command would be much more useful if the data.gov.uk datasets consistently had the data associated with them. Unfortunately in many cases there is still just a reference to another website.
Hopefully this will improve over time — while its important for data to be properly documented and contextualised, to support easy re-use it must also be easy to automate the retrieval and processing of that data. These are two separate, but important use cases.