Alf Eaton posts today to point to the new WebCite service. This is going to be very useful. Don’t think so? Well there’s plenty of research to show that link atrophy is a big problem in scientific literature:
Persistence of Web References in Scientific Research
See also: A study of missing Web-cites in scholarly articles: towards an evaluation framework which reports that “[a]fter evaluating 2162 bibliographic references it was found that 48.1% (1041) of all citations used in the papers referred to a Web-located resource. A significant number of references to URLs were found to be missing (45.8%)…

iSpecies and taxonomy (no, not that kind)

For the last few years I’ve been lurking on a mailing list run by the Taxonomic Databases Working Group. It’s a low volume list used by scientists interested in capturing and marking up taxonomies. That’s taxonomy in the Linnaean sense not the semantic web sense. I’ve been lurking there since I wrote this paper a while back proposing an XML format to replace a text based format that had been popular.
Yesterday on the list this interesting little mash-up was announced: ispecies.org. It works by searching NCBI, Yahoo images and Google Scholar to attempt to find relevant information on biological specis. Lions for example.
I found it interesting mainly because it is was one of the first mashups I’ve seen that aren’t combinations of the same old APIs (maps, music, bookmarks) but also because its clearly focused at a particular scientific community.
The author, Rod Page (apparently a big RDF fan) built this as an off-shoot of a wider project thats storing phylogenetic data as RDF. His site also has a Taxonomic Search Engine which federates a number of taxonomic name databases. Perform a search it links you to metadata about the organism. There’s a paper on the application on BioMedCentral.
Given an LSID (Life Sciences Identifier) it turns out you can get RDF metadata about the organism. Lions for example.
There’s a lot of interesting mash-up potential in this data, as well as that available from a few other projects in this area.
I’ve been keeping half an eye on this space recently, after reading this paper on how bioinformatic researchers are bumping into limits of XML and looking at RDF instead: “…the syntactic and document-centric XML cannot achieve the level of interoperability required by the highly dynamic and integrated bioinformatics applications“.
These guys have a lot of data that needs integrating and merging. Modern classification is about much more than the old Linnaean system. It has to be able to merge together data sources ranging from molecular biology through to field observations, and depending on what sources you draw on, and from what level, the tree of life can be draw quite differently.
The early web has pioneered in part by the needs of scientists exchanging research papers. It strikes me that “eScience” and bioinformatics may very well become the driving forces behind a more semantic web.

Working In A Small World

Stumbled over these musings on how small world theory applies to company organization. They’ve been languishing in my personal wiki for many months, thought I might as well post them as is.
Whilst reading the first few chapters of “Small World” by Mark Buchanan, I was fascinated by the work of Granovetter (see “The Strength of Weak Ties”). This basically highlights the fact that it is weak ties between individuals that are the important ones in a social network; not strong ties as one would expect. People with strong ties in common often have strong ties between them also, hence these links are less important than weak ties (acquaintances) as their removal has little effect on the structure of the graph (as measured in number of degrees between points). Previously descriptions I’ve read about small world phenomena have focussed on hubs/authorities which is a much less human-centric metaphor; quite rightly perhaps as “small worldism” isn’t tied to any particular type of graph, but it’s not very evocative.
This lead me to thinking about relationships within companies. Exploiting social networks to find work, etc seems well explored, indeed it’s behind the current drive for many of the social networking sites and applications that are springing up at the moment. Work relationships seems like a different framework within which to explore the small world phenomena. Or at least it’s the one that occured to me whilst washing up after dinner.

Read More »

Champernowne’s Constant

Whilst reading von Baeyers ‘Information’ recently, I came across the following fun mathematical tidbit which I thought was worth sharing. Mainly because I couldn’t find many references to it elsewhere on the ‘net.
In the chapter on “Randomness”, von Baeyer introduces several definitions of the term “random”, iteratively showing how each is slightly flawed. Considering a binary sequence of digits, the first definition describes a random number as one in which there is no pattern to the series of 1’s and 0’s. However a sequence such as 000110000100 is not random as it has an unequal proportion of the binary digits. A slightly improved definition is one which states that the numbers of each digit are approximately equal. But not only that: there combinations of the two digits (00, 01, 10, 11) must also occur in roughly equal proportions. And so on for combinations of three, four, five digits. Sequences that meet this restriction are apparently known as “normal numbers”.
The first explicit (rather than theoretical) example of a normal number is Champernowne’s Constant which was produced (discovered?) in 1933. David Champernowne pointed out that if one starts with zero, then one then string together all possible pairings, then all eight triples, an so on you end up with a number which must, by construction, contain all possible patterns, and is therefore “normal”.
Von Baeyer then points out that this number in its binary form is “a fabulous object. Using Morse code, or some other translation of zeroes and ones into typographical symbols, it can be transformed into a string of letters, spaces and punctuation marks. Since every conceivable finite sequence of words is buried somewhere in the string’s tedious gobbledygook, every poem, every traffic ticket, every love letter and every novel ever written, or ever to be composed in the future is there in that string…You may have to travel out along the string for billions of light years before you find them, but they are all in there somewhere….” (pp101-102).
So who needs a million chimpanzees with typewriters? Distributed computing project anyone?

Searching Small Worlds

Interesting “small world” article in New Scientist this week (“Know Thy Neighbour”, January 17 2004, Mark Buchanan), this time discussing how people and information can be located within a small world network.
The essay discusses Milgram’s famous experiment in which he asked people to attempt to route a letter, via their contacts, to a given person. Most of the letters got their within a small number of hops and apparently the strategy that most people, quite naturally, adopted was along the lines of “Mr X (the end-point) works in the financial sector, who else do I know that works in that sector…”. In essence people were comparing their contacts with what they know about the end point, categorising them into groups.
Groups are therefore an important feature of small world networks that are “searchable”. Classifying nodes in this way allows your local knowledge of the network (your contacts) to help manipulate it. In the case of the Milgram experiment, that manipulation was to use people to route letters, however the New Scientist article suggests that the similar techniques could be used to benefit internet search engines.

Read More »