I realised recently that, while a lot of work has been done on creating and exploring interesting extensions to the SPARQL query language, there has yet to be a systematic survey of the range of different extensions that are currently implemented in various RDF triplestores. Or if there has been a survey, then I’ve clearly missed it.
In order to get a better idea of what kinds of extensions are available I’ve set myself the task of surveying those currently implemented. I intend to write-up and share the results of that work through this blog.
I think that pulling together a list of extensions is a useful activity which should:
- Help researchers and implementors to have a clearer view of existing work, thereby encouraging further experimentation
- Promote convergence on a core set of useful extensions that could be implemented across a number of triplestores.
- Help users to have a clearer understanding of what SPARQL extensions are currently supported in particular triplestores, letting them make informed decisions about which extensions to use when writing and sharing queries
It looks like the SPARQL Working Group may well be adding a standard library of extension functions into the next revision of the query language so the timing of this work should help contribute to that effort. However I’m looking beyond their immediate goals and hope to encourage the implementor community to explore models simple to the EXSLT effort which has been successful in creating a set of community-designed extensions for XSLT transformations. I see no reason why the same process can’t be applied to SPARQL extensions.
Clarity of which extensions are portable across triplestores is important to allow users to experiment with various triplestore implementations and services. If data is going to be truly portable, then this will be an important consideration.
With that in mind I’ve begun digging into the available documentation for a number of different triplestores. I’ve decided to organize my work by surveying each of the three different types of SPARQL extension.
Types of SPARQL Extension Function
Its possible to extend the SPARQL query language in any of the following three ways:
- Extension Functions
- Property Functions (aka “Magic Predicates”)
- Language Extensions
Lets look at each of these in turn.
Extension Functions are explicitly described by the current SPARQL specification under the banner of “extensible value testing“. The standard library of extensions that may be added to SPARQL 1.1 will fall into this category. Extension Functions are simple function calls that can be used within a FILTER in a SPARQL query to carry out some specific extra logic that cannot be handled by matching triple patterns. Examples of extension functions include substring testing, string concatenation, date tests, etc.
The specification indicates that these extension functions should have a unique URI, allowing them to be globally identified. Few engines are publishing useful information at these URIs, but this seems like it would be a useful thing to do. These URIs should be grounded in the web too.
Property Functions (aka “Magic Predicates”, or “Magic Properties”) are extensions to the triple matching process that is carried out when a SPARQL query is executed. This means that property functions don’t appear in a FILTER expression like an extension function. They instead appear within the graph pattern of the query. Unlike extension functions which have a syntax like a conventional functional call, property functions use turtle syntax and appear, to the untrained eye, as standard triple patterns.
For example, as property function that could split a resource URI into a namespace and a localname might look like this in a SPARQL query:
?uri a rdfs:Class.
?uri ex:splitURI (?namespace ?localname).
In that example the the property function
ex:splitURI has as its input each of the URIs that are bound to the
?uri variable, and as its output binds the namespace URI and localname of those URIs to two new variables.
There are other ways to structure the inputs and outputs of a property function, depending on its purpose, but the important things to recognise are that:
- the property function is written as a conventional triple pattern
- parameters can be passed from either the subject or object portions of the triple (or potentially both)
- similarly, output can be bound to variables that appear in either the subject or object portions of the triple
- one technique for passing multiple parameters or generating multiple output values is to allow specification of an RDF list in the object portion of the triple
Property functions are very powerful as they can allow arbitrary complex logic to be used to extend the triple matching process. One common use is to extend the matching process by calling out to specialised indices or logic, e.g. for full-text indexing or geospatial functions and reasoning.
It is worth noting that Property Functions are not explicitly licensed by the current SPARQL specification. The specification does not describe them at all: they are simply allowed by the fact that they conform to the overall SPARQL grammar.
Testing whether a query uses Property Functions would therefore require a validator (such as the one that Dan Brickley describes here) to either have explicit knowledge of the function, e.g. based on its URI, or for implementors to publish some useful information at those locations so that a validator might determine whether a specific predicate is actually a “real” predicate or an extension through dereferencing the URI. I’m not aware of any implementation that currently does this.
The final category of SPARQL extensions are extensions to the language itself. This type of extension involves amending the grammar of the language to include new operators, keywords, and types of expression. Examples of this type of extensions include sub-queries and aggregates (e.g. min and max). The forthcoming SPARQL 1.1 specification will standardise these and a few other language extensions that have been commonly implemented.
Arguably, if one changes the grammar of a language then you’re creating a new language: “SPARQL plus some extensions”. So some care needs to be taken with respect to this type of extension if one wants queries to be portable.
In my view while there is plenty of scope for the community to collaborate and converge on common extension of all of the types I’ve described here, the best place for language extensions to be formally ratified and agreed on is through the SPARQL Working Group. I personally don’t expect the Working Group to have to, or want to sign-off on every extension function or property function, but interoperability is ultimately best served by co-ordinating language extensions through the Working Group. Naturally this should happen after the implementor community have had a period of experimentation and research. This is obviously the process that has happened to date, and hopefully this will continue as the language continues to evolve. A bit of collective action ought to help ensure interoperability in other areas.
For my survey of SPARQL extensions I’ve decided to tackle things in the order in which I have presented them here: I will first look at Extension Functions, then Property Functions, and then Language Extensions. For the rationale and reasons I’ve already outlined, I think the community is best served by organizing itself around standardising two of those types of extensions. And Extension Functions seem like the lowest hanging fruit.
I’m intending to do the survey in as open a way as possible, and want to ensure that I include as many different implementations as possible. Having said that initially I’m going to impose some editorial control simply to ensure consistency and quality. Implementors feel free to drop me a line providing me with information on your extensions or preferably pointers to the relevant documentation. I’ll also stress that while this survey has obvious relevance for my day job, that this is a personal project so things will progress as quickly as I’m able to find some time to push things forward.
I’m going to send regular status updates to the public-sparql-dev mailing list as that is the correct place for further discussion. I’ll also summarize my findings in further blog posts here. I’ve already begun the process of cataloguing Extension Functions as you can see by my recent email to the mailing list. I still have to include some additional information helpfully provided by OpenLink and to also update the entries for Mulgara to list its support for some of the EXSLT functions.
One other task I have on my list is to help provide some guidance on how implementors should publish information about their SPARQL extensions. It would be useful to have some descriptive metadata for these available from the relevant URIs. I’m intending to spend some time at Vocamp DC pulling together a vocabulary for that purpose. Let me know if you’re attending and want to collaborate.