Twinkle: A Sparql Query Tool

In the spirit of release early and often, I’ve just uploaded the first snapshot of Twinkle, a query tool for Sparql.
Twinkle is a simple Java interface that wraps the ARQ query library that Andy Seaborne is currently building as an add-on to Jena. ARQ is still under development and is not yet a supported Jena module, but I wanted to start playing with Sparql in something other than a command-line environment. Twinkle 0.1 is built on ARQ 0.9.2, which you grab from CVS or just try the web demo.
Twinkle was inspired by Elliotte Harold’s XQuisitor which provides a simple GUI interface for playing with XQuery. XQuisitor wraps Saxon 7 which have XQuery support. I’ve shameless copied Harold’s interface as well as cribbing from the code; my Swing is a bit rusty.
Download Twinkle 0.1
In this first release the tool is just functional incorporating loading and saving of queries and results, basic text editing, alongside the basic ARQ/Jena functionality: parsing of Turtle, N3, NTriples, and RDF/XML, and generation of query results as plain text and in the Sparql Variable Binding Results XML Format. ARQ also supports other query languages, but I’ve not exposed these through the GUI.
Later releases will add better error handling (it’s pretty much non-existent at the moment), a separate thread for running queries so they can be cancelled, and various other UI improvements, e.g. showing results in a JTable.
Twinkle is published under the Creative Commons Attribution-ShareALike Licence.
To run the tool simply download it, and run: java -jar twinkle.jar from the directory you’ve unpacked the distribution into.
Interested to hear feedback, especially bug reports. Hope others find it useful as they’re playing with Sparql.

Tailored Feeds

Tim Bray posted some notes on Private Syndication, referring to this ZDNet piece by David Berlind.
I’m inclined to agree that this kind of syndication is as yet a largely untapped application agree and that it’s one with a great deal of possibilities. I’d love to have a feed of my bank balance, credit card statements, etc. Might help me curb my spending 🙂
The kind of private syndication Bray and Berlind are talking about is the opposite end of the spectrum from the public feeds that most of us are consuming. It’s important not to ignore the space in-between though: between per-user and mass-audience feeds there’s a lot of other possibilities. E.g feeds tailored to a particular community or market. There’s an uneasy relationship between advertising/marketing here.
For example a publisher may want to have one feed for content subscribers, Amazon may want separate feeds for regular purchasers, etc. The content of these feeds needn’t be entirely marketing oriented though, there’s scope for “premium” content feeds, e.g. pushing out entire articles or other relevant updates. In my own application area, it would be useful for publishers to be able to produce RSS feeds tailored to subscribers/non-subscribers. A subscriber feed may have the entire content, or direct links to it. A non-subscriber feed may have limited content and links to purchasing options instead.
Whatever these “tailored” feeds contain, and whether they’re tailored for a restricted audience or individual user, the key to their success is going to be authentication support in aggregators. SSL, HTTP Auth, etc are all pre-requisites. And not only that: web based aggregators such as Bloglines (my own favourite) will have to ensure that these feeds are not shared with the rest of the user base. I’ve heard several stories of private RSS feeds being accidentally shared with the entire Bloglines community; as I understand it, they automatically add any feed to their global directory.
In fact a move to tailored feeds may take away some of the supposed value of RSS aggregators such as Bloglines. There won’t be much they can share between users. There’s been a lot written about the network overheads of RSS, this can only get worse with more tailored feeds.
Even without tailored feeds, support for authentication and non-shareable feeds would be a useful feature. At the moment I publish several private feeds internally to our company which are being rarely used as many of my colleagues are using Bloglines, or they are mobile and their desktop aggregator doesn’t support HTTP Authorisation.
Bray closes his posting my stating his belief that Atom is best suited to producing “content-critical, all-business” feeds. It’s a bold statement and I’d like to hear more about this: what exactly makes Atom better suited to carrying personalised/tailored content than any of the other RSS flavour? Kellan Elliott-McCrea has raised one issue already.
In his article, Berlind suggests that a delivery company might produce an RSS feed for every package they ship. I wonder whether, instead of requiring each company to produce fine grained feeds for all of their actions, whether it might be easier for credit card companies to act as the point of co-ordination. Actions relating to a purchase made on a card, e.g. dispatched, delivered, warranty expired, could be sent as a notification to the card company, who could then produce a secure tailored feed which aggregates all the relevant activities.
There’s already some degree of communication between the companies (the actual transaction) so really this would only require a standard interface to exchange data suitable for packaging into a feed. I can definitely see a role for the Atom API there, but I’m not clear on the unique benefits of the Atom format.

Folksonomies and Libraries

Seems like the library community is getting interested in folksonomies and how they can be used to supplement data coming from OPACS and other structured metadata.
From The Shifted Librarian: I think controlled vocabularies and folksonomies can co-exist peacefully and even complement each other. And as librarians, let’s start making use of them to complement what we’re already doing.
Sounds like it’ll be an interesting meeting. Take the red pill.

Tag Spam

So any signs that “tag spam” has started yet?
“Tag Spam” will be (or is) the practice of associating unrelated content (pr0n links, adverts, viagra pill adverts, press releases) with well-known tags with the purpose of encouraging click-throughs to said content.
Seems like a natural extension of referrer, comment, and trackback spam, and I’m curious to know whether it’s happening yet.
Actually doing it seems very easy given the open API’s of sites like And given Technorati’s support of the rel=”tag” attribute, I can create tag spam by just setting up a blog, posting a bunch of random entries and then pinging their site. Therefore seems just as hard to manage as the other types of attacks.
Time to start engineering in trust metrics I think.

Idea for Personal Timeline Viewer

As one of my new years resolutions was to not sit on ideas until I get time to implement them (I never do), so I’ve created an “Ideas” category that I can use to write these down. Here’s my first one:
I’d like an application that would build be a personal timeline of my web activity.
Actually, I’m also interested in viewing other people’s activities, but lets get on with describing what I mean.

Read More »

Semantic Web != Text Analysis; Semantic Web != Controlled Vocabularies

Stefano says The future of the semantic web is LSI. While I agree that LSI is definitely a cool technology, and is an interesting alternative to Bayesian techniques I don’t agree that it’s the future of the semantic web. The semantic web isn’t about text analysis. It’s not about data mining a document corpus. But those are possible applications of the technology.
There are vast amounts of high quality data being produced with semantic web technologies all the time. No need to apply LSI there.
Danny has already deconstructed this piece so I won’t comment a great deal further beyond saying that despite the title the article has nothing to say about what works or doesn’t work about ontologies, it merely describes some of the issues a search engine vendor faces when indexing text.
The article also glosses over a lot of interesting activity elsewhere. For example Norvig says that:
Essentially what we’re doing here is using the power of masses of untrained people who you aren’t paying to do all your work for you, as opposed to trying to get trained people to use a well-defined formalism and write text in that formalism and let’s just use the stuff that’s already out there. I’m all for this idea of harvesting this “unskilled labor” and trying to put it to use using statistical techniques over masses of large data and filtering through that yourself, rather than trying to closely define it on your own.
What about all the buzz about folksonomies? What’s that if its not harnessing “unskilled labor” to generate structured metadata? Just because its a few simple tags doesn’t mean it’s not metadata, just because it’s categorization without using a formal ontology doesn’t mean it’s not generating useful machine-readable metadata. And should we ignore it because it’s free-form and user-generated? Technorati apparently don’t think so.
Over that metadata I can start making additional statements, drawing together related tags, drill down to extract rich metadata about the photos, etc. This is what the semantic web is about: building a machine-readable infrastructure over and above what we already have.
But while the new technorati service is undoubtedly cool, before one gets too excited over it read through this paper on folksonomies which compares tagging practices in flickr and The different media and communities of the two sites lead to some quite different results. The technorati service can provide some good illustrations of that and other issues raised in the paper.
We don’t have to throw everything out and start again. We don’t have to restrict ourselves to mining what’s out there, and we don’t have to wait for “ontology astronauts” to deliver us a set of fixed ontologies before we can being doing useful work. While I agree with Shirky that comparisons of controlled vocabularies and folksonomies should account for economic factors, I disagree with the implication that it’s an either/or choice.
What about a folksonomy created by a designated community of experts, e.g. researchers in the field of interest? It’s not a controlled vocabulary in the usual sense, has the economic benefits of a folksonomy, but needn’t suffer from some of its problems. What about applying an editorial layer on top of a folksonomy to draw together related tags, etc? It’d address some of the issues outlined in the folksonomy review paper I referenced above. There’s plenty of room in between the two extremes of controlled vocabularies and user-generated tagging.
It’s a requirement of the key technologies underlying the semantic web that one can create vocabularies in just such distributed fashion, and relate them together when we need to.
Boot-strapping machine readable data from existing sources. Late binding of data to application schemas.
That’s the future and benefit of the semantic web IMO.

XForms and FOAF and Cross-Player Woes

Mark Birbeck posted to rdfweb-dev on Friday to announce an XForms-based FOAF creator.
To use the tool you’ll need to install the latest version (and patches) of Forms Player.
As Birbeck notes in the announcement the demo does nicely show-case some of the XForms features, notably extracting information for the form labels from the demo, being able to remotely fetch remove XML files, etc. In other words XForms is XML aware so creating and manipulating XML is a doddle. Overall it’s a very nice demo.
This isn’t the only FOAF-a-Matic style XForms implementation. Charles McCathieNevile gave a talk at XML Europe 2004 in which he described his own uses of XForms to generate RDF. The forms are available online. I played with an XForms version of the FOAF-a-Matic last year also, using it to demo the technology for a talk. There’s also a simple FOAF editor in the Mozquito DENG examples.

Read More »