Category Archives: Uncategorized

Interesting Papers from CIDR 2009

CIDR 2009 looks like it was an interesting conference, there were a lot of very interesting papers covering a whole range of data management and retrieval issues. The full list of papers can be browsed online, or downloaded as a zip file. There’s plenty of good stuff in there ranging from the energy costs of data management, forms of query analysis and computation on “big data”, and discussions on managing inconsistency in distributed systems.
Below I’ve pulled out a few of the papers that particularly caught my eye. You can find some other picks and summary on the Data Beta blog: part 1, and part 2.
Requirements for Science Databases and SciDB from Michael Stonebraker et al, presents the results of a requirement analysis covering the data management needs of scientific researchers in a number of different fields. Interestingly it seems that for none of the fields covered, which includes astronomy, oceanography, biologic, genomics and chemistry, is a relational structure a good fit for the underlying data models used in the data capture or analysis. In most cases an array based system is most suitable, while for biology, chemistry and genomics in particular a graph database would be best; semantic web folk take note. The paper goes on to discuss the design of SciDB which will be an open source array-based database suitable for use in a range of disciplines.
The Case for RodentStore, an Adaptive, Declarative Storage System, Cudre-Mauroux et al, introduces RodentStore an adaptive storage system that can be used at the heart of a number of different data management solutions. The system provides a declarative storage algebra that allows a logical schema to be mapped to a specific physical disk layout. This is interesting as it allows greater experimentation within the storage engine, allowing exploration of how different layouts may be used to optimise performance for specific applications and datasets. The system supports a range of different structures, including multi-dimensional data, and the authors note that the system can be used to manage RDF data.
Principles for Inconsistency, proposes some approaches for cleanly managing inconsistency in distributed applications, providing some useful additional context and implementation experience for those wrapping their heads around the notion of eventual consistency. I’m not sure that’d I’d follow all of these principles, mainly due to the implementation and/or storage overheads, but there’s a lot of good common sense here.
Harnessing the Deep Web: Present and Future, Madhavan et al, describes some recent work at Google to explore how to begin surfacing “Deep Web” information and data into search indexes. The Deep Web is defined by them as pages that are currently hidden behind search forms and that are not currently accessible to crawlers through other means. The work essentially involved discovering web forms, analysing existing pages from the same site in order to find candidate values to fill in fields in those forms, then automatically submitting the forms and indexing the results. The authors describe how this approach can be used to help answer factual queries, and is already in production on Google. This probably explains the factual answers that are appearing on search results pages. The approach is clearly in-line with Google’s mission to do as much as possible with statistical analysis of document corpora as possible, there’s very little synergy with other efforts going on elsewhere, e.g. linked data. There is reference to how understanding the semantics of forms, in particular the valid range of values for a field (e.g. a zip code) and co-dependencies between fields, could improve the results, but the authors also note that they’ve achieved a high level of accuracy in automated approaches to identifying common fields such as zip code, etc. A proposed further avenue for research is exploration of whether the contents of an underlying relational database can be reconsistuted through automated form submission and scraping of structured data from the resulting pages. Personally I think there are easier ways to achieve greater data publishing on the web! The authors reference some work on a search engine specifically for data surfaced in this way, called Web Tables which I’ve not looked at yet.
DBMSs Should Talk Back Too, Yannis Ioannidis and Alkis Simitsis, describes some work to explore how database query results and queries themselves can be turned into human-readable text (i.e. the reverse of a typical natural-language query system), arguing that this provides a good foundation for building more accessible data access mechanisms, as well as allowing easier summarisation of what a query is going to do, in order to validate it against the users expectations. The conversion of queries to text was less interesting to me than the exploration of how to walk a logical datamodel to generate text. I’ve very briefly explored summarising data in FOAF files, in order to generate an audible report using a text-to-speech engine, and so it was interesting to me to see that the authors were using a graph based representation of the data model to drive their engine. Class and relation labelling, with textual templates, are a key part of the system, and it seems much of this would work well against RDF datasets.
SocialScope: Enabling Information Discovery on Social Content Sites, Amer-Yahia et al, is a broad paper that introduces SocialScope a logical architecture for managing, analysing and presentation information derived from social content graphs. The paper introduces a logical algebra for describing operations on the social graph, e.g. producing recommendations based on analysis of a users social network; introduces a categorisation for types of content present in the social graph and means for managing it; and also discusses some ways to present results of searches against the content graph (e.g. for travel recommendations) using different facets and explanations of how recommendations are derived.

Quakr

Quakr is a project to build a 3-dimensional world from user contributed photos, a.k.a. some friends having fun with geek hacking. I see they submitted an abstract to XTech too. The blog links to some interesting experiments mashing up Google Maps with a Flash and VRML viewer.
The Quakr 7D Tiltometer is worth viewing too if only for its sheer Blue Peter stylee “build this at home” excellence.

My First Computer

Sinclair ZX Spectrum
A scan of the promotional flier for the Sinclair ZX Spectrum that I carried round for months prior to my parents buying me a 48K Spectrum for Christmas.
Click through to the larger image to read the marketing text. Here’s some extracts:
“Professional power — personal computer price!”
“Your ZX Spectrum comes with a mains adaptor and all the necessary leads to connect to most cassette records and TVs (colour or black and white)”
“…later this year there will be Microdrives for massive amounts of extra on-line storage, plus an RS232/network interface board”
“Sound — BEEP command with variable pitch and duration”
“High speed LOAD & SAVE — 16K in 100 seconds via cassette, with VERIFY and MERGE for programs and separate data files.”
I learnt to program from those handy Spectrum BASIC manuals mentioned in the advert supplemented with weekly doses of Input Magazine; never did get the hang of assembly or machine code though. Not beyond a few peeks and pokes lifted from the ever trusty Crash magazine, covers of which (along with CV&G) still adorn some of my old school books lurking in the attic.

Yep That’s Me

A view of my del.icio.us bookmarks:
extisp.icio.us – ldodds
Pretty accurate with respect to my interests these days. The Java/Speech tag is overblown though just because I’ve not marked other Java related pages.
It’s just a damn shame I can’t make it to FOAFCamp or the FOAF Workshop. Family holidays and work deadlines have crowded out my schedule.
Link courtesy of Many-to-Many.

Comments Disabled

Comments are now disabled on this blog (by the brute force method of moving the CGI script) until I decide on a better way to handle comment spam. It’s getting to be a real pain in the arse.

Lazy Photo Annotation

I was taken to task by my mother over Xmas. She’d been browsing my website during her lunch hour and had neglected to find any new photos, and precious few of her latest grandchild.
After setting aside thoughts that I’d slipped into an issue of The Onion I realised she was right, and that those dozens and dozens of images I’ve taken with my spangly new digital camera really ought to be published somewhere.
But I don’t want to do it half-heartedly, I want to publish as much metadata as possible along with the images themselves. There’s lots of fun to be had with co-depiction and rdf annotation.
But I’m essentially a lazy person so want a really, really simple way to publish and annotate the photos. So far I’ve been able to think of two, each with it’s own merits.

Continue reading

Pining For University

Every year come October I get this overwhelming urge to go back into education again. I loved being at university as both an undergraduate studying Biology and as a postgraduate studying Computing, and freely admit to pining for the mental stimulation that full-time education brings. And yes, the lazy mornings and impromptu mid-week drinking sessions, but that’s another story…
So, come October I always wonder what I might have done if I’d continued on with my education. I’m still adamant that one day I’ll take a crack at a Ph.D; perhaps when the kids are older and I can get them cleaning chimneys to bring in a few more shiny pennies.

Continue reading

Unit Testing PL/SQL

For my sins I’ve been writing a bit of PL/SQL recently. It’s been nearly 4 years since I had to do that in anger and predictably I’ve forgotten way more than I remember. At the time I was responsible for redesigning the database for a Laboratory Information Management System used by researchers at Pfizer looking for new drugs. After redesigning the data model I had to write code to port from one to the other. That was a lot of code, and required a lot of testing. Fun project though, and an interesting application.
Of course now I know all about test driven development and the first thing that occured to me was: “how do I test this stuff?”.

Continue reading

Hypertext ’03 Papers

Just noticed that the papers from the Hypertext ’03 conference are online. Some interesting stuff to dig into there.
Hypertext’03 Conference: Complete List of Papers
You can also download PDFs of the poster presentations and demos.

…And…Relax…

Been rather busy over the past few weeks one way or another. I took some holiday after the Big Chill (which was excellent, btw — The Cinematic Orchestra were my muscial highlight of the weekend, other other was bumping into a friend I’ve not seen for about 9 years) but came back to all sorts of deadlines. Mostly day job stuff, but I’ve just finished another developerWorks tutorial this time on XML Catalogs. Expect a few ruminations from me on that score as there’s a number of items that didn’t make it into the tutorial that I want to write up. And here seems as good a place as any.
Got some time now to sit back, relax and catch up on my reading. Practical RDF dropped through my door this week so I’ll be taking a look at that over the weekend. I’ve also got some time to play with my new camera which’ll be fun.

Follow

Get every new post delivered to your Inbox.

Join 29 other followers