24 November 2010

A nice bit of mail

I was pleased to find in my mail yesterday something of which I had no previous notice.  Apparently, the Bulletin of ASIST, volume 36, no. 3, which celebrated the SIG on Information Behaviour Research, won the 2010 Publication of the Year Award. And, because I contributed a short paper to that issue, I get a Certificate of Recognition!  Now I have to decide if I can find a space on my study wall :-)

21 November 2010

Celebrating Archie

This week, the New Scientist magazine celebrated the 20th anniversary of the first attempt to produce a search engine for the Internet.  This was Archie, developed by Alan Emtage and students at McGill University.  The time was pre-Web and the level of disorganization on Internet sites was even greater than that of Web sites today, so it didn't work very well :-)

The feature in the New Scientist (only available if you subscribe - or buy the paper copy) is not a very coherent study - basically it's a set of "windows', each by a different person, plugging a particular point of view - not the best technology journalism you've ever read.  It's suggested that all kinds of development are in train, and no doubt they are, but will any of these new Google killers every amount to anything?

It's instructive to follow the time chart on page 49, which shows the emergence of Cuil and WikiSearch... and their decease a couple of years later (or in the case of WikiSearch, even earlier).  The fact is that, to get anywhere near challenging Google (assuming you do so before Google buys you up) the technology investment required is enormous.  This is not a low cost of entry business and you are going to have to produce something pretty miraculous to get venture funding that would enable you to grow - fast!

There's the usual nod in the direction of yet another promise of semantic search, depending, as ever, on the willingness to Web authors to tag their work with meaningful tags instead stuff like <h2></h2> - and how soon do you think that will happen? Just about as soon as authors and organizations begin to put up validated xhtml.

Never mind, there's one fun graphic on page 46, with the note: "If this spread [i.e., two times A4] was everything on the Internet, the yellow box [2cms x 1cm] would be the surface Web visible to search engines and the red [3mm x 1.5mm] would be the Websites indexed by search engines so far."  So, a little way to go, eh?