ISWC 2005 Submission

As I’ve alluded to in the past we’ve been exploring moving our content repository over to an RDF triple store.
It’s turning out to be pretty massive, we’ve learnt a few things along the way, and no doubt have much more to learn as we continue with the project. Seemed worthwhile submitting a conference paper to share the experience. So here’s what Priya, Katie, and myself have just submitted to ISWC2005:
The IngentaConnect website contains metadata from 17 million articles sourced from 20 thousand publications. The aim of the Metastore project is to build a flexible and scalable repository for the storage of this bibliographic metadata. The repository will replace several existing data stores and will act as a focal point for integration of a number of existing applications and future projects. Scalability, replication and robustness were important considerations in the repository design.
After introducing the benefits of using RDF as the data model for this repository, the paper will introduce the practical challenges involved in creating and managing a very large triple store. The repository contains over 200 million triples from a range of vocabularies including Dublin Core and PRISM. To our knowledge, this is the largest triple store of its type in existence.
The challenges faced range from schema design, initial data loading, query performance, and integration of the repository into existing applications.
The paper will introduce the solutions developed to meet these challenges with the goal of helping others looking to deploy a triple store within a commercial environment. The paper will also suggest some avenues for further research and development.

Won’t hear whether the submission will be accepted until July, but expect to read more here over the coming months.

New SPARQL Specification

I’m very pleased to see the publication of the latest SPARQL specification.
It looked for a while as if FROM was going to be removed from the language, i.e. specifying the data set would be a protocol issue and not something supported in the query language. This seemed like a bad idea to me, not because of any deep technical issues, but more because I see a lot of benefit in having self-contained queries. Something that I can pass on to an expert for optimisation, or cut-and-paste into code or query tool, publish for others to use, etc.
Secondly I was originally concerned that there would be no ORDER BY; sorting seemed pretty essential to me. Apparently I wasn’t the only one, and the DAWG have accepted sorting as a design objective. The new Building a Table of Contents use case also gave me cause to grin this morning.
I’m less enamoured of the new Turtle style syntax, much preferring the original bracketed triples which are easy to read IMO. I find the additional magic punctuation to be a real turn off. I’ve not had time to dig through the public archives to find out what motivated the change. No doubt I’ll get used to it though.
But to end on a positive note, it’s really great to see ORDER BY in the specification and progress continuing. Have printed out the spec and Dave’s SPARQL reference for a little light reading so I can catch up with more of the detailed changes.

Who Should You Vote For?

I just filled this out as an idle exercise to see that the results would be: Who Should You Vote For?. I don’t feel that any party represents my views especially well, but my expected outcome was Liberal Democrat. You can see the results for yourself:

Labour 8
Conservative -21
Liberal Democrat 34
UK Independence Party 25
Green 26

You should vote: Liberal Democrat

The LibDems take a strong stand against tax cuts and a strong one in favour of public services: they would make long-term residential care for the elderly free across the UK, and scrap university tuition fees. They are in favour of a ban on smoking in public places, but would relax laws on cannabis. They propose to change vehicle taxation to be based on usage rather than ownership.

Take the test at Who Should You Vote For

So there you go!

Bath Local Heroes

So I read about the Tentacle Man, the Red Man, the Grey Lady and the Purple Man, and tried to find a picture of the much missed Bath Blue Man, but couldn’t find one. Anyone care to share?
His gimmick was to paint himself blue and stand immobile in an alcove next to the Roman Baths waiting for tourists to fling coins in his general direction. I actually preferred to see him idling in-between bouts of standing, still when he’d be enjoying a crafty roll-up. Many pretenders have tried to take his place (and his space come to think of it), but without success.
I did find this Local Heroes in Bath list though, which made me chuckle, especially entries like “Rolf Harris visits a bit”. I’m slightly dubious about the assertion that Michael Jackson lives in Bath though. I may have to trot into town at the weekend to see if I can see Cheggers on the High Street, Lionel Blair in Boots, or maybe Seal in Sainsburys.
I’m doing slightly better than the author who wrote:
“Lived there for 21 years. I’ve never clapped eyes on a celeb. If you’re a tourist p,raps you’re more likely to look”
…as I did once see Anthony Stuart Head on the train. He had a brace of teenager girls travelling with him who I could only assume were “Slayers in Training”.
Come to think of it, if you’re a Bath resident you’re perhaps more likely to spend your time tripping over tourists and forging a path through a sea of backpacks than looking for tourists.
1000 points to the first person to spot Rolf Harris on a bus tour, 500 points for any other celebrity, excluding Peter Gabriel who apparently “has been known to drive through town in his sports car” instead.

More Intermediary Patterns

Prompted by some contributions from Mark Pilgrim, I’ve just made another quick update to the intermediary patterns.
This expands on the number of Data patterns to include Page Context Reader and External Context Reader and the Action patterns to include Restyler, Blocker, and Rearranger.
Ryan Shaw is also looking at documenting his Greasemonkey script amazon2melvyl
using the patterns which is excellent. To help this I’ve clarified the Screen Scraper and Semantic Markup Reader patterns and also added the Link Scanner pattern. Scripts that search the current page for links seems a fairly common activity.
In Jon Udell’s latest screencast: Content, services, and the yin-yang of intermediation you can see both the URL Parser and Link Scanner patterns in action. He demonstrates his Library Lookup Bookmarklet, which is a Button Press(*) URL Parser that generates a form. Udell also demonstrates the Library Lookup service as a Greasemonkey script, this is an example of an Event Driven Link Scanner which drops links into the page.
(*) Not happy with that name, can anyone suggest a better one?