Set Algebra For Updating a Triple Store

Lets assume we have a stored graph G_store. Also that we have been given another graph of incoming data G_in that contains some modifications to a specific sub-graph.
Lets also assume that we have a function view() that can extract the “equivalent” sub-graph (i.e. equivalent view) of the original data.
In pseudo code to apply these updates we do the following:
G_view = view(G_store) G_delete = G_view - G_in G_insert = G_in - G_view G_store' = G_store.remove(G_delete).add(G_insert)
Job done. The Jena API provides methods for handling the basic operations see, for example, the difference method. You can also wrap the modifications to G_store in a transaction.
The nice thing is that this is agnostic to the actual data being updated, we don’t care which triples are being added or inserted. This differentiates it from the SPARQL Update Language, specifically the MODIFY operation, which requires the patterns being inserted or deleted to be added to the query. Changesets are much the same.
In the above approach the detail of what is being changed (or is being allowed to change) is shifted out of the triple store update code and into the view() function. The extent of the graph that is returned by this function must match that being passed as input. So we’ve defined a specific “document type“. As it turns out this is quite reasonable as you can generally match, e.g. a RESTful service call, to a view based on the identifier of the item to which the content is being posted, its media-type, other service parameters, etc.
In terms of implementing the view() function, it turns out you can go a long way with a SPARQL CONSTRUCT operation. DESCRIBE isn’t suitable as you don’t have control over how the sub-graph is built.
I think there are strengths and weaknesses to all of the different approaches to updating RDF stores and suspect that there isn’t going to be a one size fits all approach. For example SPARQL Update looks like a handy syntax to use when the modifications all follow predictable patterns, e.g. I’m doing parameterized updates to some stored data, much like parameterized updates in a SQL database. Changesets offer some extra functionality around store versioning which doesn’t drop out of the set logic approach (although it could be added).
Oh, and the keen eyed amongst you will notice that this approach does involve some “thrashing” of updates for bnodes, because they don’t compare as equal. But what ya gonna do?! 🙂

Share this:

Published by Leigh Dodds