Feeding Google Co-Op with SPARQL

Last night I took my first look at Google Co-op, in particular the “Subscribed Links” feature which allows users to add services to Google search results.
The developer documentation outlines the process by which you can go about creating a service, which is simply a matter of creating a fairly simple XML configuration file.
The configuration breaks down into two key chunks: search descriptions (“ResultSpecs“) and data (“DataObjects“).
DataObjects have an identifier and a type, which is just a literal String. Identifiers must be unique across a type. Types can be “City” or “Radio Station” or something else meaningful to your service. The objects are basically hashes of name/value pairs. This allows you to expose arbitrary data to the Google Co-op servers in a reasonably structured format.
ResultSpecs are a description of the kinds of searches that your service can respond to. Your service can then annotate the Google results with a little green section labelled “Subcribed Links” that looks very similar to the “Sponsored Links” we’ve been seeing for some time now.
While you can use regular expressions to match queries that users type into a Google search, you can also define simple structured queries such as: “What’s the elevation of [City]”. This is very reminiscent of the kind of conversational interfaces people have been exploring for chat/IRC bots for some time.
If you were to type in the above example, Google Co-op would then examine your DataObjects of type “City” and see if the text that the user enters matches any of the specified names. You can register multiple variations of a query to try and catch common usage. Similarly DataObjects can have multiple names to deal with abbreviations, alternative spellings, etc.
Once you’ve authored your file, you can request that Google Co-op index it to make the service available to users. There’s a directory to find services to which you may want to subscribe. You can also look at someones’ profile (here’s
) to discover what services they’ve made available and see how they look.
You don’t have to supply your service in one file though. Its possible to feed it the ResultSpecs and DataObjects as separate documents.
This got me to wondering whether I could feed Google Co-op DataObjects created by a SPARQL query, suitably transformed into the right format. It turns out you can.
Here’s a query over the Periodic Table in OWL that extracts some basic metadata about each element:

PREFIX t: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name ?symbol ?weight ?number ?id ?color
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
?element t:name ?name
; t:symbol ?symbol
; t:atomicWeight ?weight
; t:atomicNumber ?number.
OPTIONAL { ?element t:casRegistryID ?id. }
OPTIONAL { ?element t:color ?color. }

And here is a stylesheet that will turn the results of that query into the Google Co-op DataObject format. You can see the complete data for yourself.
The stylesheet is parameterised so you can feed in the author and a description of the file. You can also specify the “type” of the DataObject. This means that the same stylesheet can be used to generate alternative DataObjects, its not just for producing a description of chemical elements. You’ll need to ensure that there’s a “name” variable in the results as this is used to create the id and name of the DataObject. All other results bindings are simply turned into Attributes of the object.
So far so good. I was able to register that complete URL with Google Co-op and it came along and retrieved all the dynamically generated data. I’m not clear yet how often they’ll update it, or whether this is a manual process. I’m watching my logs to see if the Googlebot visits regularly.
To complete the hack I wrote a simple ResultSet which looks for searches like “chemical nitrogen” and displays the name, atomic weight and colour of the element. Pretty trivial, but at this stage I just wanted to prove that the concept would work.
Again I just registered this separate URL with Google Co-op and it was absorbed into the Google borg within a few minutes.
If you visit my public profile you can see some sample results as I’ve configured a demonstration search for “silver”. You should also hopefully be able to subscribe to the link from that location also.
So the end result is that I’ve used SPARQL and XSLT to produce data suitable for feeding Google co-op and creating services from it.
Is this useful? Maybe, maybe not.
In this instance, using a single data source, SPARQL is overkill. But like my earlier hack with Google Earth overlays all one need do is tweak the SPARQL query to create other kinds of data objects, possibly based on merging data from multiple sources. E.g. metadata about who discovered the elements, etc.
I also like the fact that I can focus on the data collection and querying and not the actual conversion/integration.
It’d be interesting to put together a more useful service by merging useful RDF data such as information on cities, locations, reviews, the CIA World Factbook, etc.