Publishing SPARQL queries and documentation using github

Yesterday I released an early version of sparql-doc a SPARQL documentation generator. I’ve got plans for improving the functionality of the tool, but I wanted to briefly document how to use github and sparql-doc to publish SPARQL queries and their documentation.

Create a github project for the queries

First, create a new github project to hold your queries. If you’re new to github then you can follow their guide to get a basic repository set up with a README file.

The simplest way to manage queries in github is to publish each query as a separate SPARQL query file (with the extension “.rq”). You should also add a note somewhere specifying the license associated with your work.

As an example, here is how I’ve published a collection of SPARQL queries for the British National Bibliography.

When you create a new query be sure to add it to your repository and regularly commit and push the changes:

git add my-new-query.rq
git commit my-new-query.rq -m "Added another example"
git push origin master

The benefit of using github is that users can report bugs or submit pull requests to contribute improvements, optimisations, or new queries.

Use sparql-doc to document your queries

When you’re writing your queries follow the commenting conventions encouraged by sparql-doc. You should also install sparql-doc so you can use it from the command-line. (Note: installation guide and notes still needs some work!)

As a test you can try generating the documentation from your queries as follows. Make a new local directory called, e.g. ~/projects/examples. Execute the following from your github project directory to create the docs:

sparql-doc ~/projects/examples ~/projects/docs

You should then be able to open up the index.html document in ~/projects/examples to see how the documentation looks.

If you have an existing web server somewhere then you can just zip up those docs and put them somewhere public to share them with others.However you can also publish them via Github pages. This means you don’t have to setup any web hosting at all.

Use github pages to publish your docs

Github Pages allows github users to host public, static websites directly from github projects. It can be used to publish blogs or other project documentation. But using github pages can seem a little odd if you’re not familiar with git.

Effectively what we’re going to do is create a separate collection of files — the documentation — that sits in parallel to the actual queries. In git terms this is done by creating a separate “orphan” branch. The documentation lives in the branch, which must be called gh-pages, while the queries will remain in the master branch.

Again, github have a guide for manually adding and pushing files for hosting as pages. The steps below follow the same process, but using sparql-doc to create the files.

Github recommend starting with a fresh separate checkout of your project. You then create a new branch in that checkout, remove the existing files and replace them with your documentation.

As a convention I suggest that when you checkout the project a second time, that you give it a separate name, e.g. by adding a “-pages” suffix.

So for my bnb-queries project, I will have two separate checkouts:

The original checkout I did when setting up the project for the BNB queries was:

git clone git@github.com:ldodds/bnb-queries.git

This gave me a local ~/projects/bnb-queries directory containing the main project code. So to create the required github pages branch, I would do this in the ~/projects directory:

#specify different directory name on clone
git clone git@github.com:ldodds/bnb-queries.git bnb-queries-pages
#then follow the steps as per the github guide to create the pages branch
cd bnb-queries-pages
git checkout --orphan gh-pages
#then remove existing files
git rm -rf .

This gives me two parallel directories one containing the master branch and the other the documentation branch. Make sure you really are working with a separate checkout before deleting and replacing all of the files!

To generate the documentation I then run sparql-doc telling it to read the queries from the project directory containing the master branch, and then use the directory containing the gh-pages branch as the output, e.g.:

sparql-doc ~/projects/bnb-queries ~/projects/bnb-queries-pages

Once that is done, I then add, commit and push the documentation, as per the final step in the github guide:

cd ~/projects/bnb-queries-pages
git add *
git commit -am "Initial commit"
git push origin gh-pages

The first time you do this, it’ll take about 10 minutes for the page to become active. They will appear at the following URL:

http://USER.github.com/PROJECT

E.g. http://ldodds.github.com/sparql-doc

If you add new queries to the project, be sure to re-run sparql-doc and add/commit the updated files.

Hopefully that’s relatively clear.

The main thing to understand is that locally you’ll have your github project checked out twice: once for the main line of code changes, and once for the output of sparql-doc.

These will need to be separately updated to add, commit and push files. In practice this is very straight-forward and means that you can publicly share queries and their documentation without the need for web hosting.