Triple Store Test Suites

I was very pleased to see this post pop up on PlanetRDF: Stress test your triple store. Ten million triples from the Swoogle cache ready for download.
As it happens I’m trying to get sign off at the moment to release part of our data set for research purposes. Not confident of how far I’m going to get as there are a number of different parties that would have to agree, but I have my fingers crossed.
Katie and Priya have been doing some sterling work; a 200M triple data set ain’t that easy to work with. So far we’ve found that Jena on Postgres has proved to be the most stable. We’ve had problems with both Kowari and Sesame. In some cases we’ve been able to resolve them. Query performance times on that size of data set are (not surprisingly) really slow, but accessing resources directly (i.e. by URI) is just fine. We’ll produce a more structured report as soon as we can.
It strikes me that the search/text retrieval community benefited from having large test collections, I think the RDF community needs something similar. It’s not hard to generate synthetic triples, but they can’t compare to real data sets for comparison purposes. Seeing the Swoogle data be released is great news.

One thought on “Triple Store Test Suites

  1. Hitting reload is the framework job

    Adam Bosworth once said that 50% of a programmer’s time is spent doing plumbing. 50% sounds good about now.

Comments are closed.