Information Research - ideas and debate: search engines

Showing posts with label search engines. Show all posts

24 August 2009

Browser failure

The last post got me thinking about how much real progress in search there has been since the first search engine appeared on the Web. Clearly quite enormous progress has been made in some areas, with Google leading the way in delivering search outputs that respond to the entered search terms. But no one seems to have developed anything that will answer questions.

For example, I tried the following question in Google, Yahoo, Bing, Chrome, Wolfram Alpha, SenseBot, Hakia, Powerset, Deepdyve and Ask.com

"In what sense is a programming language a language?"

I imagine that this is a subject that has been debated now and again and which is a reasonable question to ask. However, neither the standard search engines (Google, Yahoo, etc.) nor the so-called 'semantic search engines' (Hakia, SenseBot, etc.) came up with any results on their first output page - and if it isn't there, the question would be, Why not?

I see that Powerset has been bought by Microsoft, in the hope of improving Bing, presumably, but it looks as though search still has a long way to go before it is anything more than a method for matching input terms against document terms.

It might be thought that the search topic was in some sense "unfair", but surely it is exactly questions of this kind that present the real challenge for information retrieval research? The straightforward problem of matching terms has probably been cracked and certainly most users of search engines appear to be pretty well satisfied with what they get. However, because the output from search engines is so good today, it raises expectations of how good it can be, and those expectations are probably going to be dashed.

14 January 2009

Green Google?

The carbon footprint of Google has been a topic of debate recently. I first saw an item on the BBC site, which suggested that a Google search generates as much CO2 (7g) as boiling a kettle. Something disputed in the official Google blog, which claimed a figure 0.2g for a search - pretty big difference. The dispute spread through the blogosphere pretty rapidly and The Guardian newspaper stepped in to clear it all up - more or less - and then to enlarge on it further. It turns out that the physicist quoted never said what is claimed to have been said. Ah! The wonders of modern communications - you'd still get "three and fourpence for the dance".

20 December 2008

New from Information Age

The industry journal Information Age has some interesting items in the current issue - all of which is on the Web.

First, 'cloud computing' is on the agenda, offering potential for cost cutting in hard times. However, it's likely to be small- to medium-sized companies that adopt first, since:

Certainly, for organisations that have spent thousands of man years and millions of dollars building their own bomb-proof infrastructure to support complex, and often highly regulated systems, there is simply too much at stake for them to abandon this investment in favour of a set of shared resources that are out of their control.

Next there's an article on 'enterprise search', with a couple of interesting case studies that suggest getting the equivalent of a siimple Google search box is simply not on the cards.

Finally, something on the 'semantic Web' - that never-ending fascination with the idea that, somehow, we're going to get some meaning recorded - but don't hold your breath.

31 July 2008

Google competitor - really?

A new search engine called "Cuil" - the names get weirder and more unpronouncable! - is getting some publicity at the moment. It's been on the BBC Technology site and now Jack Schofield of the Guardian has an article on the subject. The article has generated quite a lot of discussion, which is worth having a look at. I can't say that I'm impressed: I've just tried searching for "information research" and it comes up with 'No results were found for: "information research"' Very strange! When I remove the inverted commas, I get results, but some of them are odd and I wonder if Information Research is actually scanned by the service. This feeling is increased when I search for titles of papers in the journal and find nothing - so perhaps the vaunted "biggest search engine" isn't really doing a very good job? There are also weird things going on, for example, a photograph is attached to the entry for this Weblog, but it isn't a picture of me! In fact, the same pictures are placed on the page in other places in relation to completely different topics - I've no clue as to what is going on here, but it doesn't fill me with either confidence or enthusiasm. I'll stick to Google.

01 February 2008

Search behaviour

There's an interesting article on the Boxes and Arrows site: Search Behavior Patterns, by John Ferrara. The article drew my attention to a new(ish) search engine called "Easy Search Live", which offers a 'live view' feature. Click on "live view" beside any item and a window opens to show you the site. Quite a neat feature for finding useful sites within the search output, rather than clicking and opening a new tab or window for the site.

19 January 2008

"Advanced search"

Stephen Turbek has an interesting post in his Weblog on the 'advanced search' function found with many search engines.

Advanced search is the ugly child of interface design -always included, but never loved. Websites have come to depend on their search engines as the volume of content has increased. Yet advanced search functionality has not significantly developed in years. Poor matches and overwhelming search results remain a problem for users. Perhaps the standard search pattern deserves a new look. A progressive disclosure approach can enable users to use precision advanced search techniques to refine their searches and pinpoint the desired results.

Designers of library catalogues please note!

30 December 2007

A metasearch engine

Thanks to Research Buzz for drawing my attention to Zuula - a new-ish metasearch engine. I don't use these things much myself, but Zuula might change my mind, so I've added it to my Firefox search engines.

It's interesting to see what comes up from the different engines when searching for information research: for Web searches Zuula uses Google, Yahoo, MSN, Gigablast, Exalead, Alexa, Accoona and Mojeek (yes - I'd never heard of some of those, either!) and I score these on the scale: 3 - the journal link appears as the top item; 2 - ...in the top five; 1 - ... on the first page; 0 ...not on the first page. On this basis, the scores are:

3: Google, Yahoo, MSN, Gigablast
1: Exalead
0: Alexa, Accoona, Mojeek.

Using the blog search proved interesting: Zuula uses Google, Technorati, IceRocket, Blogpulse, Sphere, Bloglines, Blogdigger, BlogDimension and Topix. In terms of finding items relating to either the journal or the Weblog (in either its old or its new manifestation) on the top page of results, I had expected Technorati to win, but no! Using a simple count of the number of references to the journal or my Weblog, we find:

9: Topix
4: Google
3: Blog-Dimension
2: Blogdigger
1: Bloglines
0: Technorati, IceRocket, Blogpulse, Sphere

I hadn't heard of Topix, but it is obviously worth a look.

16 December 2007

Another search engine

Aficionados of search engines might be interested in Carrot. This uses multiple search engines and then clusters the results by Topics, Sources and Sites. This is a demo site, but it seems to have possibilities.