Searching Small Worlds

Interesting “small world” article in New Scientist this week (“Know Thy Neighbour”, January 17 2004, Mark Buchanan), this time discussing how people and information can be located within a small world network.
The essay discusses Milgram’s famous experiment in which he asked people to attempt to route a letter, via their contacts, to a given person. Most of the letters got their within a small number of hops and apparently the strategy that most people, quite naturally, adopted was along the lines of “Mr X (the end-point) works in the financial sector, who else do I know that works in that sector…”. In essence people were comparing their contacts with what they know about the end point, categorising them into groups.
Groups are therefore an important feature of small world networks that are “searchable”. Classifying nodes in this way allows your local knowledge of the network (your contacts) to help manipulate it. In the case of the Milgram experiment, that manipulation was to use people to route letters, however the New Scientist article suggests that the similar techniques could be used to benefit internet search engines.


Filippo Menczer at the University of Indiana is carrying out research in this area. His list of papers is online, and a quick surf through them makes interesting reading.
For example in Topical Web Crawlers: Evaluating Adaptive Algorithms (PDF) Menczer et al describe “topical crawlers” (emphasis mine):

Topical crawlers (also known as focused crawlers) respond to the particular information
needs expressed by topical queries or interest profiles. These could be the
needs of an individual user (query time or online crawlers) or those of a community
with shared interests (topical or vertical search engines and portals). Topical
crawlers support decentralizing the crawling process, which is a more scalable approach…An additional benefit is that such crawlers can be driven by a rich context (topics, queries, user profiles) within which to interpret pages and select the links to be visited.

In other words, the crawler can get away with indexing less pages as it’s guided to the most relevant material by other cues. The paper Search Engine-Crawler Symbiosis: Adapting to Community Interests describes how a community search engine can improve web crawler performance and vice versa through learning the communities interests.
I seems to me that FOAF could play a role here: rather than solely rely on machine learning to discover information about a document and community interests, one could explicitly gather than data from the aggregated FOAF descriptions of that community.
E.g. Using a FOAF description one can not only determine the interests of a person linking to a given document, but one can also determine the interests of the author of that document, assuming there are appropriate links from the HTML to the FOAF description (cf: autodiscovery, and I Made This). There’s even a grouping mechanism that can further help search agents to adapt their paths, using the same technique as Milgram’s test subjects. Again the algorithm seems natural: if you want to learn more about a particular topic or event, you’d start looking at the websites, documents and blogs of people you know are interested in that area.
Cool stuff. Just wish I had the maths to understand it all properly!