Feeding Google Co-Op with SPARQL

Last night I took my first look at Google Co-op, in particular the “Subscribed Links” feature which allows users to add services to Google search results.
The developer documentation outlines the process by which you can go about creating a service, which is simply a matter of creating a fairly simple XML configuration file.
The configuration breaks down into two key chunks: search descriptions (“ResultSpecs“) and data (“DataObjects“).
DataObjects have an identifier and a type, which is just a literal String. Identifiers must be unique across a type. Types can be “City” or “Radio Station” or something else meaningful to your service. The objects are basically hashes of name/value pairs. This allows you to expose arbitrary data to the Google Co-op servers in a reasonably structured format.
ResultSpecs are a description of the kinds of searches that your service can respond to. Your service can then annotate the Google results with a little green section labelled “Subcribed Links” that looks very similar to the “Sponsored Links” we’ve been seeing for some time now.
While you can use regular expressions to match queries that users type into a Google search, you can also define simple structured queries such as: “What’s the elevation of [City]”. This is very reminiscent of the kind of conversational interfaces people have been exploring for chat/IRC bots for some time.
If you were to type in the above example, Google Co-op would then examine your DataObjects of type “City” and see if the text that the user enters matches any of the specified names. You can register multiple variations of a query to try and catch common usage. Similarly DataObjects can have multiple names to deal with abbreviations, alternative spellings, etc.
Once you’ve authored your file, you can request that Google Co-op index it to make the service available to users. There’s a directory to find services to which you may want to subscribe. You can also look at someones’ profile (here’s
) to discover what services they’ve made available and see how they look.
You don’t have to supply your service in one file though. Its possible to feed it the ResultSpecs and DataObjects as separate documents.
This got me to wondering whether I could feed Google Co-op DataObjects created by a SPARQL query, suitably transformed into the right format. It turns out you can.
Here’s a query over the Periodic Table in OWL that extracts some basic metadata about each element:

PREFIX t: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name ?symbol ?weight ?number ?id ?color
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
?element t:name ?name
; t:symbol ?symbol
; t:atomicWeight ?weight
; t:atomicNumber ?number.
OPTIONAL { ?element t:casRegistryID ?id. }
OPTIONAL { ?element t:color ?color. }

And here is a stylesheet that will turn the results of that query into the Google Co-op DataObject format. You can see the complete data for yourself.
The stylesheet is parameterised so you can feed in the author and a description of the file. You can also specify the “type” of the DataObject. This means that the same stylesheet can be used to generate alternative DataObjects, its not just for producing a description of chemical elements. You’ll need to ensure that there’s a “name” variable in the results as this is used to create the id and name of the DataObject. All other results bindings are simply turned into Attributes of the object.
So far so good. I was able to register that complete URL with Google Co-op and it came along and retrieved all the dynamically generated data. I’m not clear yet how often they’ll update it, or whether this is a manual process. I’m watching my logs to see if the Googlebot visits regularly.
To complete the hack I wrote a simple ResultSet which looks for searches like “chemical nitrogen” and displays the name, atomic weight and colour of the element. Pretty trivial, but at this stage I just wanted to prove that the concept would work.
Again I just registered this separate URL with Google Co-op and it was absorbed into the Google borg within a few minutes.
If you visit my public profile you can see some sample results as I’ve configured a demonstration search for “silver”. You should also hopefully be able to subscribe to the link from that location also.
So the end result is that I’ve used SPARQL and XSLT to produce data suitable for feeding Google co-op and creating services from it.
Is this useful? Maybe, maybe not.
In this instance, using a single data source, SPARQL is overkill. But like my earlier hack with Google Earth overlays all one need do is tweak the SPARQL query to create other kinds of data objects, possibly based on merging data from multiple sources. E.g. metadata about who discovered the elements, etc.
I also like the fact that I can focus on the data collection and querying and not the actual conversion/integration.
It’d be interesting to put together a more useful service by merging useful RDF data such as information on cities, locations, reviews, the CIA World Factbook, etc.

XTech 2006: SPARQLing Services

I’m currently at XTech in Amsterdam and earlier this morning I gave my talk, “SPARQLing Services“, an overview of SPARQL, with an emphasis on the SPARQL protocol and how SPARQL can benefit Web 2.0/AJAX applications.
The full paper is online at the conference website.
I’ve also uploaded the slides to my website which you can view online. They’re available as PDF and an OpenDocument presentation.
The talk seemed to go well, and there were a few questions afterward. I chickened out from giving a live demo, as I’d had a few instances of Google Earth crashing my laptop. I’ve already blogged the basics of that hack.
I did another demo of a little AJAX hack that used Elias and Lee’s SPARQL-JSON library which was easy to work with. I’ll get that demo published in a few days. I realised along the way that using the JSON output of a SPARQL query its quite simple to create “data aware” AJAX widgets.
I think it’ll be quite trivial to create a set of standard components that take as input a SPARQL query that returns data in a specific format (i.e. specific columns in the result set). Once I’ve got my horrible hack online, I’m going to try making one or two of the Yahoo! Javascript Widgets capable of accepting SPARQL input.

Generating Google Earth Overlays with SPARQL and XSLT

Here’s some notes on some hacking I’ve been doing over the last few days.
Firstly, I’ve implemented some extensions for manipulating geo data in SPARQL queries. There’s a write-up on the XMLArmyKnife blog.
That allows you to ask questions such as “find points 10 kms from this location” and “test whether a point is within a bounding box”.
Secondly, I’ve also been exploring options for manipulating the results of SPARQL SELECT queries using XSLT. I’ve updated mortenf’s SPARQL to RSS stylesheet, so you can now generate RSS 1.0 from appropriately “shaped” SELECT queries.
I’ve also implemented a stylesheet to convert SPARQL results into a Google Earth KML file. To use it you’ll need to ensure that your SELECT query returns four variables: title, description, lat, long. These form the title, description and co-ords of a series of Placemarks in the generated KML file. Actually only the latitude and longitude are required, but the view will be pretty uninteresting without some titles.
The Placemarks are gathered into a folder. To configure a name and description for the folder you can pass in two stylesheet parameters, folder-name and folder-desc. As noted you can just add these to the query string of an XML Army Knife SPARQL Service query.
If you do use the XAK query, then the Content-Type of the response is automatically configured based on the media type set in the stylesheet, so you should find Google Earth automatically launches if its configured to do so in your browser.
Pulling all that together, here’s a SPARQL query that lists Museums in London. Or at least, museums known to The Open Guide to London which happens to have an RDF view of its data. And here’s the results in the SPARQL Results XML format.
Combining that with the SPARQL to KML stylesheet, here’s a dynamically generated KML layer that shows Museums in London.
If you use the “Add Network Link” option in Google Earth you can manually add the URL and configure the refresh parameters, e.g. once every 48 hours. That way your Google Earth installation will periodically pick up any new data added to the London OpenGuide.
Its very easy to come up with alternate queries that generate other results, especially if you use the geo extensions, which would allow you to ask for museums (or other places) within a few kilometres of where you’re staying or visiting.
Have fun!

Jena User Conference Write-Up

I was at the Jena User Conference this week up in HP Labs, for an very enjoyable series of Semantic Web and Jena related talks.
There was an excellent series of presentations, see the schedule and #swig notes from Wednesday and Thursday for an overview of the sessions and lots of handy pointers. The proceedings and presentations should be online shortly.
I’ve come away with a long lists of projects to take a closer look at. RDFReactor, KnoBot, EyeBall, and D2R Server appear high on my list.
Of course the best aspect of any conference is the chance to get together face-to-face and discuss problems, cool ideas, and show off neat hacks. Had lots of fun discussions with the usual suspects, like Dan, Damian and Libby (the Bristol Semweb Mafia :), Tom Heath, Dave Beckett, etc. And I finally got chance to meet the rest of the Jena team, Richard Cyganiak (who gave me one cool idea for a means of transparently integrating Slug into Jena apps), Reto, and that bloke called Danny.
There was plenty of scope for break-out sessions which I put to good use, attending and giving some demos, and getting into some impromptu hacking. The latter gently encouraged by Dan Brickley: “Have you implemented that BoundingBox query yet? Have you done it yet? So…is it done yet? 🙂
The conference was very well organized. A lot of effort was expended to make sure everything was running correctly. Thanks to the Jena team and Jan Ward for putting it all together. A very enjoyable two days.
Roll on XTech!

Slug Presentation

I’ve uploaded the slides and paper I presented at the Jena User Conference this week.

The talk seemed to go well, I had lots of questions both after the talk and in the break out sessions. Turned out a few people have been playing with Slug already. Some useful extensions suggested as well. I billed Slug as “nagware”. Nag me to get features added 🙂
I think the full set of papers will be available from the conference website shortly.