Champernowne’s Constant

Whilst reading von Baeyers ‘Information’ recently, I came across the following fun mathematical tidbit which I thought was worth sharing. Mainly because I couldn’t find many references to it elsewhere on the ‘net.
In the chapter on “Randomness”, von Baeyer introduces several definitions of the term “random”, iteratively showing how each is slightly flawed. Considering a binary sequence of digits, the first definition describes a random number as one in which there is no pattern to the series of 1’s and 0’s. However a sequence such as 000110000100 is not random as it has an unequal proportion of the binary digits. A slightly improved definition is one which states that the numbers of each digit are approximately equal. But not only that: there combinations of the two digits (00, 01, 10, 11) must also occur in roughly equal proportions. And so on for combinations of three, four, five digits. Sequences that meet this restriction are apparently known as “normal numbers”.
The first explicit (rather than theoretical) example of a normal number is Champernowne’s Constant which was produced (discovered?) in 1933. David Champernowne pointed out that if one starts with zero, then one then string together all possible pairings, then all eight triples, an so on you end up with a number which must, by construction, contain all possible patterns, and is therefore “normal”.
Von Baeyer then points out that this number in its binary form is “a fabulous object. Using Morse code, or some other translation of zeroes and ones into typographical symbols, it can be transformed into a string of letters, spaces and punctuation marks. Since every conceivable finite sequence of words is buried somewhere in the string’s tedious gobbledygook, every poem, every traffic ticket, every love letter and every novel ever written, or ever to be composed in the future is there in that string…You may have to travel out along the string for billions of light years before you find them, but they are all in there somewhere….” (pp101-102).
So who needs a million chimpanzees with typewriters? Distributed computing project anyone?

We’re All Harmonies

Attempting to bring the teetering pile of books in my “to read” pile under control I’ve banned myself from Waterstones and the local library so I don’t get tempted to buy any others. So far it’s been mostly successful: I’ve strayed in, but not actually bought anything.
In typically anal retentive fashion I’ve adopted a strategy for working through the book list. I could have used date of purchase or alphabetic order to organize my reading list, but instead have sorting them by category. So at the moment I’m working through all the physics books whilst absorbing the occasional novel as light-relief.
I’ve already read Flatland, Flatterland, Information, and am currently working through Hyperspace, with the Elegant Universe next and last on the list.
So I now understand more about multiple dimensions, string theory and quantum physics than I ever did before. Which means that I’m still hopeless lost, especially when the current interpretation of quantum physics may be crumbling. Still, fun and intriguing stuff.
The purpose of this post though was to note what it is that I find so fascinating about string theory: if all of space-time can be reduced to the vibrations of 10 dimensional string (or maybe 26 depending on which theory you subscribe to) then that means we’re all just harmonics. A melody within the greater score that is the world around us.
And what really gives the romantic in me a nice fuzzy feeling is the thought that my children are new harmonics, new melodies that have unfolded from the overture of their parents.
So the saying is true, we really can make beautiful music together.

Sounds of Sunlight

I went to the Festival of Gardens at Westonbirt Arboretum last weekend. Definitely worth a visit if you’re in the area.
Being a complete idiot and having forgot my camera I don’t have any photos, so you’ll have to look at the designs on the website for some impression of what’s there. Essentially its an annual showcase for garden designers. We went last year and it’s an interesting mix of the weird and wonderful.
The highlight for me this year was Andrew Stonyer’s Sounds of Sunlight. These were three sculptures suspended amongst the trees in the arboretum consisting of a photo-voltaic cell attached to an electric motor turning a small aluminium disc; attached to the discs were little loops of wires. Suspended above the wires were guitar strings. As the sun shone down through the trees, the discs turned and the wire loops “strummed” the guitar strings. The sculptures were attached to amplifiers so the effects could be heard for some distance through the trees; it was fun to watch people come looking for the source of the intermittent sounds.
Much cooler than wind chimes.

How Big Is Your Store?

I’ve just kicked off a project to explore changing our main content repository to an RDF triple store. The main issue at the top of my list is scalability.
The repository will end up holding metadata about more than 16 million articles (plus their associated authors, affiliations, publications, etc) and as you’d imagine thats going to end up exploding into a large number of triples.
Reading through the Semantic Web Storage and Retrieval report I came across the following, which concerns me a bit:

Stores that can handle 10-20M triples are readily available and the current state of the art is around 40M; the development community is considering the next 10x increase in storage requirements, and their affect on indexing, which has tended to be O(n) for triples. Novel dedicated storage approaches such as in RDFStore were shown to avoid this. The dedicated non-relational stores can outperform the relational ones in such scaling, although the relational databases continue to perform well.

Basically I’m wondering whether I’m being a bit ambitious?
I’d be interested to hear about how big a store people have worked with, including which APIs, etc they’ve been using.
To give a bit more context, as we’re mainly a Java shop I’ve begun by considering any store that can be plugged into Jena. So the Jena persistence model support would be my baseline, with Kowari being another candidate. I see that RDFStore may be adding Jena support so we may explore that too.
I have no strong preferences for a relational versus non-relational store, except that there are other benefits (backups, distribution, etc) that I doubt are “baked in” to other custom stores. What am I hoping for (but haven’t ruled out) are creating a custom database for our content, but I wanted to avoid some of that pain by using off-the-shelf toolkits.
Commenting is switched off (spam problems) so if you’re interested in sharing experiences please send me a mail. I’ll write up notes on any responses I get with an aim to sharing them with the rest of the community. I’ll do the same to write-up our prototyping experiences.

Lessons Learned By An XML Hacker

I’m going to be speaking at XMLOpen in September, and I notice that my (late) abstract has now made it into the conference programme,
The talk is titled “Understanding RDF: Lessons Learned by an XML Hacker” and will be an attempt to synthesize the things I’ve learnt about RDF, the semantic web, etc over the last few years whilst contributing to the FOAF project.
For the seasoned RDF developer the talk is unlikely to offer many fresh insights, but if you’re still new to the technology, or are interested in wading in, then I hope there will be something there for you. I certainly don’t class myself as an expert, this is just a chance to share some of the lessons I’ve learnt.

XML Hacks

I see by the fact that my complementary copy arrived today that XML Hacks has hit the stores. This makes me incredibly pleased as my two contributed hacks mean that this is the most I’ve ever had in print, and that’s like, proper writing, not this new-fangled web malarkey.
My two hacks are #64 (“Identify Yourself with FOAF”) and #93 (“Use Cocoon to Create a Well-Formed View of a Web Page, Then Scrape It For Data”). Both are RDF flavoured. The first is basically an edited version of my article, “Introduction to FOAF“, while the second is an original piece that provides a lightning introduction to Cocoon then shows how to create a simple web service that will scrape RDF metadata from a web page using a combination of HTML Tidy, XSLT and some rummaging around in the head element.
Kudos to Michael Fitzgerald for pulling together a book that contains such a wide range of useful hacks, and having the patience to do it whilst working with a number of very, very busy people!