Describing SPARQL Extension Functions

Leigh Dodds Semantic Web November 5, 2009May 9, 2013 3 Minutes

At the end of my recent post on Surveying and Classifying SPARQL Extensions I noted that I wanted to help encourage implementors to publish useful documentation about their SPARQL Extensions. If you’re interested in the current state of that survey then you can check out my current spreadsheet listing known extension functions. There are more to add there, but its a good summary of the current state of play.

At VoCamp DC last week I did some work on designing a small vocabulary for describing SPARQL Extensions. The first draft of this is online here: SPARQL Extension Descriptions. There’s a little bit of background on the Vocamp wiki too, if you want to see my working :).

Here’s an example of the vocabulary in use, describing some extensions to the ARQ SPARQL Engine:


<http://jena.hpl.hp.com/ARQ/function> a sed:FunctionLibrary;
  dc:title "ARQ Function Library";
  dc:description "A collection of SPARQL extension functions 
      implemented by the ARQ engine";
  foaf:homepage <http://jena.sourceforge.net/ARQ/library-function.html>;
  sed:includes <http://jena.hpl.hp.com/ARQ/function#sha1sum>.
  
<http://jena.hpl.hp.com/ARQ/function#sha1sum> 
  a ssd:ScalarFunction;
  rdfs:label "sha1sum";
  dc:description "Calculate the SHA1 checksum 
       of a literal or URI.";
  sed:includedIn <http://jena.hpl.hp.com/ARQ/function#>.

<http://jena.hpl.hp.com/ARQ#self> a sed:SparqlProcessor;
  foaf:homepage <http://jena.hpl.hp.com/ARQ>;
  rdfs:label "ARQ";
  sed:implementsLibrary <http://jena.hpl.hp.com/ARQ/function>;

Ideally what should happen is that every URI associated with a filter function and property function should be dereferencable, and that terms from this vocabulary be used to describe those functions. There’s a lot more detail that could be included, but I suspect this is sufficient to cover the primary use cases, i.e. documentation and validation.

The draft SPARQL 1.1. Service Description specification does cover some of this ground, but falls short in a few places, and I think some of what I’ve described here could usefully be folded into that specification without greatly extending its scope. But thats a matter for the Working Group to decide.

One specific issue is that the specification doesn’t currently recognise “functional predicates” (to use Lee Feigenbaum’s preferred term; others include “property functions” and “magic properties”) as a distinct class of extensions. They clearly exist, so I think we should have a means to describe them. In fact arguably they are the most important class of SPARQL extensions that need describing.

Filter functions are relatively well understood and can clearly be identified based on where they appear in a query. Language extensions will generate a parser error if an endpoint doesn’t support them, so will easily be caught. But functional predicates use existing turtle triple pattern syntax, but typically involve triggering custom logic in the SPARQL processor, rather than actually appearing as triples within the dataset. Without the ability to dereference their URIs and identify them as a functional predicate, a SPARQL engine will simply treat them as a triple pattern and fail silently, rather than complaining that the extension is not supported.

The following example query illustrates this:


PREFIX list: <http://jena.hpl.hp.com/ARQ/list#>
PREFIX func: <http://jena.hpl.hp.com/ARQ/function#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX ex: <http://example.org/vocab/>

SELECT ?doc ?contributor WHERE {
   ?s dc:modified ?created.
   ?s ex:authors ?authorList.
   ?authorList list:member ?author.
   LET ( ?contributor := ?author )
   FILTER ( ?created < func:now() )
}

The above query contains 3 extensions: a language extension (LET); a filter function (func:now()); and a functional predicate (list:member). Without prior knowledge of that predicate, or the ability to dereference its URI, there’s no way to know that the functional predicate is not really a triple that the query author is attempting to match against, rather than an extension.

I’d like to urge all implementors to consider making their extension URIs dereferencable. The schema I’ve drafted is very light-weight so shouldn’t be difficult to support. I’m also very happy to take comments on its design. I’m intending it as a starting point for others to help build upon.

Published by Leigh Dodds

CTO at EnergySparks (https://energysparks.uk), previously Director of Delivery at the Open Data Institute. Proud Dad. View all posts by Leigh Dodds

Published November 5, 2009May 9, 2013

6 thoughts on “Describing SPARQL Extension Functions”

Holger Knublauch says:

November 5, 2009 at 11:29 pm

Leigh, the SPIN vocabulary has been designed for exactly this purpose: to define new SPARQL functions and magic properties so that they can be resolved by their URI. Look for example at

http://www.spinrdf.org/spin.html#spin-functions

and

http://composing-the-semantic-web.blogspot.com/2009/01/understanding-spin-functions.html

This vocabulary not only covers things like rdfs:comment about the function, but also details about the function’s arguments. Furthermore, SPIN function declarations can include a body declaration that points to executable code, including another SPARQL query and JavaScript code.

I am not quite sure why you do not even reference this related work, even though I have pointed you at this in my previous comments on the blog? What is wrong with the spinrdf vocabulary for your use case?
Pingback: uberVU - social comments
admin says:

November 6, 2009 at 10:50 am

Hi Holger,

You’re correct I should have looked more closely at SPIN after you had pointed it out to me, as it does have vocabulary that addresses similar use cases.

Looking at the documentation I can see that it defines classes for both extension functions (ScalarFunctions in the SPARQL 1.1. parlance) and magic properties (what I’ve called FunctionalPredicates).

And your examples do encourage people to attach labels and comments to function descriptions, which is one of the key things I want to achieve.

It was totally obvious to me that SPIN could be used in this way based on my earlier, and less in depth looks at the specifications.

So yes, SPIN could very much be used to achieve what I’m suggesting here, so apologies for lack of reference.

Having said that there’s a couple of areas where my draft vocabulary includes some extra constructs.

Firstly, I’ve defined the notion of a FunctionLibrary. Is this something that SPIN supports?

Secondly, I’ve added some terms to describe a SPARQL processor, and relate it to the functions it implements. As far as I can see SPIN doesn’t do that, as a SPIN function definition is basically a description of how an engine could/should implement a function, not whether it supports it. I think there’s a subtle difference there? Would be interested in your thoughts.

Finally, I’ve also tried to define my terms and extra properties so that it aligns with the SPARQL 1.1 terminology. Clearly you couldn’t have done this as SPIN was developed first! One reason for doing that is to ensure things relate sensibly, and I’m sure SPIN could be tweaked to do the same.

But, to be honest, I’d be more than happy if the SPARQL Working Group decided, or could be convinced, to slightly extend the scope of the Service Description specification to include a little more information. Illustrating the small amount of extra vocabulary required is one useful way to do that I think. And this is another reason for my attempting this.

I’d be more than happy for another, more standard or well-supported vocabulary to get widely used. The key thing is to get people documenting that extensions!

Thanks again for your comments.
Yves says:

November 6, 2009 at 11:08 am

Hello!

Nice post! Just one small terminology comment: I wouldn’t qualify list:member as a functional predicate (which implies that for one subject, there is just one object).

I would qualify list:member as a non-deterministic (one subject, N possible objects) builtin (the engine knows how to derive truth values for it) predicate.

Cheers!
y
1. admin says:
  
  November 6, 2009 at 11:28 am
  
  Hi Yves,
  
  Thanks for the clarification! There seems to be no phrase to describe this feature that everyone is happy with. So far we have:
  
  * Magic Predicates
  * Magic Properties
  * Property Functions
  * Functional Predicates
  
  I don’t really like the “Magic” ones, and have tended to prefer property functions. I’ve been using “Functional Predicates” lately as I picked up the term from Lee Feigenbaum, but clearly its got some additional connotations that are confusing. Perhaps “property functions” is a less confusing term.
Holger Knublauch says:

November 6, 2009 at 5:51 pm

Thanks, Leigh. Yes I can see that your vocabulary covers aspects that are not in the SPIN vocabulary, and these aspects perfectly make sense. All I wanted to clarify is that for the overlapping bits (i.e. function interface declarations), we (as a community) should attempt to reuse existing and established vocabularies. SPIN has been “out there” for almost a year now, and I know of many groups that use it. Furthermore it is supported by a free editing tool (TopBraid Composer) and an open source API. I hope it will be in the spirit of the Semantic Web to link to the same URIs for properties and classes that really mean the same. The more we do this, the more likely it will be to get widely deployed. And yes, for the parts not in SPIN it is perfectly fine to use another namespace. I really hope that either the W3C will one day add such a generic SPARQL extension point declaration mechanism, or at least some de-facto standard emerges.

BTW, I am also not thrilled about the term MagicProperty, and I had thought long about how to name them in SPIN. I went for spin:MagicProperty because they are really declared as rdf:Properties as well. The term “property function” is IMHO confusing because it overlaps with the term “function”, which means something similar but for FILTER and project statements. It is not a good idea to have two identifiers for distinct things, and one identifier is a sub string of the other…