Writing an ARQ Extension Function

The core SPARQL specification provides some hooks for extension in the
form of Extensible
Value Testing
. This allows an application to provide custom functions for
testing variables in a SPARQL query, where the
built-in tests
don’t cover a particular need.

The specification notes that: SPARQL queries using extension functions are likely to have
limited interoperability
, so you should use them with some care. The processing
of implementing a custom function and registering it with your query engine will be bespoke to that API.

Caveats aside, I think it’s likely that once SPARQL sees further adoption a number of
useful functions will be identified by the community. The process of ensuring that such
functions are ported across differing SPARQL implementations could be the job of a
community initiative. Just like the EXSLT initiative
has helped standardise XSLT extensions functions; many of these were incorporated in
the more recent XSLT specifications.

So lets take a quick look at how to implement a custom extension function using
ARQ.

Tip: we’re going to be working with some of the ARQ innards here, and it’ll
be useful to be able to reference the full Javadoc for the API. To generate this,
grab ARQ from CVS and execute: ant javadoc-all. You’ll need
Ant installed, obviously.

All functions in ARQ whether they are built-in SPARQL functions, are part of
the ARQ function library,
or are a custom function, implement the com.hp.hpl.jena.query.function.Function
interface. The package containing this class provides some default implementations of
this interface for functions that take zero or more parameters: FunctionBase0
through to FunctionBase4. It also provides a simple registry for
dynamically adding functions at runtime.

For this example we’ll create a single argument function. And, rather than use the
FunctionRegistry we’ll dynamic function loading capability of the
engine; we can then just drop code into the CLASSPATH. There’s some notes on this in
the ARQ documentation. Essentially the engine supports a pseudo URI scheme of java:
which is used to reference classes in the CLASSPATH. If you’ve ever written
an extension function
for SAXON you’ll have encountered a similar mechanism.

Our custom function will simply find the Last-Modified date of the URL its provided as a parameter.
We would use this to discover which URLs in a graph has recently changed, e.g. to
look for activity in a users FOAF description, RSS feeds, etc.

View the source to the function
to see the implementation details. The following details are worth noting:

Firstly its the exec() method that does all the work. In this case it takes a single parameter as we sub-classed FunctionBase1. The value will be that of the node currently being tested (i.e. FILTERed) by the
query engine. The NodeValue may be a literal, blank node or a resource depending on how the user has constructed their query. You’ll want to check and assert that the value is legal for your function.

Secondly the exec method must return a NodeValue. In this instance we’re going to
return a date using the NodeValue.makeDate() factory method. There are
additional static methods for returning other kinds of values, e.g. booleans and other XSD datatypes.

Thirdly, the function uses a cache to store the Last-Modified dates it has recently fetched. This is for efficiency: depending on how the SPARQL query is constructed a given function may be called multiple times with the same node value. To
cut down on HTTP calls we store the previously encountered results, creating a simple
Map for this purpose. Actually retrieving the Last-Modified date is very straight-forward
as there are methods on the
URLConnection
class for this purpose. We use a date formatter to ensure the value is consistent with that of an XSD date.

If you download the example file, compile it and put it in your CLASSPATH, you can
make use of it in queries as follows:


PREFIX myfn: <java:com.ldodds.sparql.>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?title ?r
WHERE {
?s rdfs:seeAlso ?r.
OPTIONAL {?r dc:title ?title.}
FILTER ( isURI(?r) &&
myfn:LastModified(?r) >
xsd:dateTime('2005-11-06T00:00:00Z') )
}

The first line, declaring the myfn PREFIX is where we hook into the
dynamic function loading magic. The java: URI must match the package
name of your custom function, and end with a dot. The function name is concatenated
with this string value to dynamically load your extension.

The query will extract the titles and URLs of any resource referenced via an
rdfs:seeAlso property. We use a FILTER to ensure that we only
match URIs (redundant here, but important to avoid errors), and also limit the
results to only return resources that have a Last-Modified date after 6th November 2005.
You could make the query more useful by using the
parameterised query technique I
described before.

I’ve just run this against my own FOAF description
and ended up with results like this:


------------------------------------------------------------------------------------------------
| title                                    | r                                                 |
================================================================================================
| "del.icio.us/ldodds"                     | <http://del.icio.us/rss/ldodds>                   |
|                                          | <http://www.ldodds.com/blog/blog-scutterplan.rdf> |
| "wordtin"                                | <http://www.ldodds.com/wordtin/data/wordtin.rss>  |
| "Lost Boy"                               | <http://www.ldodds.com/blog/index.rdf>            |
| "Audioscrobbler Musical Profile: ldodds" | <http://ws.audioscrobbler.com/rdf/history/ldodds> |
------------------------------------------------------------------------------------------------

Which indicates that in the last 24 hours I've updated my blog, listened to some
music, and added some bookmarks to my del.icio.us account. A Scutter could use the
results of this query to build a list of URLs for refreshing a database. A more robust
extension would also check the ETag of
the resources, performing a Conditional GET
to more accurately spot updates. Error handling is also obviously lacking, e.g. to deal with
missing or moved resources. Improving the function is left as an exercise for the reader.

Feel free to take the code from this tutorial and use if it for your own applications.

You can easily go mad with these kind of extensions, with the end result that most of the logic is in the extension rather than the query. But with care SPARQL extensions functions can provide a handy way to graft additional capabilities into the language.