Information Research - ideas and debate: December 2007

30 December 2007

A new Pew Internet and American Life report

Just published: Information searches that solve problems. How people use the internet, libraries, and government agencies when they need help..

There's a great deal of interesting reading in this report by Leigh Esterbrook (Prof. Emerita, Univ. of Illinois), Evans Witt and Lee Rainie. One table records the "Sources for Help in Dealing with a Specific Problem" from which we find that the Internet has the highest proportion of users:

Use the internet - 58%
Ask professional advisors, such as doctors, lawyers or financial experts - 53%
Ask friends and family members - 45%
Use newspapers, magazines and books - 36%
Contact a government office or agency - 34%
Use television and radio - 16%
Go to a public library - 13%
Use another source not mentioned already - 11%

Particularly interesting is that while only 42% did NOT use the Internet for information on specific problems, 87% did NOT use the public library. So, while public libraries may still serve important functions in their communities, it seems that the answering of specific problems has not become so firmly established as to enable that function to persist in the age of the Internet. It may have something to do with the fact that on the Internet one can find not only information but also advice from trusted sources (e.g., on health problems), while public librarians have always steered away from offering advice, or in most cases, serving as a venue for advisory services offered by other agencies.

Whatever the ultimate outcome in the future of the public library, it seems that the answering of specific problems is unlikely to be part of that future.

A metasearch engine

Thanks to Research Buzz for drawing my attention to Zuula - a new-ish metasearch engine. I don't use these things much myself, but Zuula might change my mind, so I've added it to my Firefox search engines.

It's interesting to see what comes up from the different engines when searching for information research: for Web searches Zuula uses Google, Yahoo, MSN, Gigablast, Exalead, Alexa, Accoona and Mojeek (yes - I'd never heard of some of those, either!) and I score these on the scale: 3 - the journal link appears as the top item; 2 - ...in the top five; 1 - ... on the first page; 0 ...not on the first page. On this basis, the scores are:

3: Google, Yahoo, MSN, Gigablast
1: Exalead
0: Alexa, Accoona, Mojeek.

Using the blog search proved interesting: Zuula uses Google, Technorati, IceRocket, Blogpulse, Sphere, Bloglines, Blogdigger, BlogDimension and Topix. In terms of finding items relating to either the journal or the Weblog (in either its old or its new manifestation) on the top page of results, I had expected Technorati to win, but no! Using a simple count of the number of references to the journal or my Weblog, we find:

9: Topix
4: Google
3: Blog-Dimension
2: Blogdigger
1: Bloglines
0: Technorati, IceRocket, Blogpulse, Sphere

I hadn't heard of Topix, but it is obviously worth a look.

29 December 2007

On the longevity of papers in OA e-journals

As the year end is approaching, I thought I would take a look at my Google Analytics reports, to see what is going on. At least one thing seems worth reporting, in that the most hit journal paper on the InformationR.net site (which includes many other things than the journal!) was published in Vol. 4 No. 3, February 1999. This was Joyce Kirk's paper on information management and it has racked up 4,177 page views in the past year. Looking further, I found that nineteen papers from volumes 3 and 4 appeared in the top 100 papers (measured by page views). Currently, Joyce's paper has over 48,000 'hits' and, according to Google Scholar, 21 cites. The other early papers in the top 20 were:

Ranked 5: Student attitudes towards electronic information resources, by Kathryn Ray & Joan Day (Vol 4 No. 2 paper 54 )

Ranked 6: Ethnomethodology and the Study of Online Communities: Exploring the Cyber Streets, by Steven R. Thomsen, Joseph D. Straubhaar, and Drew M. Bolyard (Vol 4 No. 1 paper 50)

Ranked 17: "In the catalogue ye go for men": evaluation criteria for information retrieval systems, by Julian Warner, (Vol. 4 No. 4 paper 62)

Ranked 20: MISCO: a conceptual model for MIS implementation in SMEs, by R.Bali, G.Cockerham (Vol. 4 No. 4 paper 61)

In carrying out this exercise, I discovered that not all of the papers in the journal have Google Analytics code in them, so I'll have to remedy that!

28 December 2007

Portrait view

I treated myself to a new screen for Christmas, a 19" Viewsonic VP930b, which has the big advantage of having a 'pivot' mode, allowing me to use it in either landscape or portrait view. Most of the time, I use it in portrait view, since most of the time I'm composing text and being able to see an entire A4 page on the screen at 100% magnification is really worth having. For many Web pages, too, the portrait view is superior, since one can scan more of the page at a single glimpse. Some pages (although I suspect that they may be a minority these days) assume a landscape view, and there are those sites that have an opening page, and sometimes more, in the form of a small, landscape oriented box (usually with incredibly small type size), for which the landscape orientation is normal. Film viewing will also require the landscape view, but I suspect that there will be more and more demand for pivot screens that can be used in either mode.

Problems of privacy and Google Reader?

Like millions more, I guess, I use Google Reader for my RSS feeds and I've always found it perfectly satisfactory. However, hackles are being raised by Google's decision to allow the links you have shared with friends or family to be available to anyone. Now, this doesn't bother me, because I rarely share links, in fact, I think I've only done it once, but others might well be put off.

It doesn't stop there: if you are a GTalk user for chat, then the links will also be shared with anyone with whom you have chatted - again, I don't use GTalk, so it doesn't affect me.

Read Jack Schofield's column for the low-down.

25 December 2007

Merry Christmas

I really should have said this before the previous message, but...

Never mind - better late than never: a very Merry Christmas and Happy New Year to one and all - especially if you are a regular reader of Information Research!

...and don't get me started on the collapse of tradition that results in people saying 'Happy Christmas' :-)

Farewell Browster, hello Cool Iris

I had been using the Browster add-in for Firefox for some time, but was experiencing a problem, so went looking for an update. That's when I discovered that it had died. However, something better turned up, Cool Iris - another link previewer, which works in much the same way as Browster, but which I find to be more user-friendly. With Browster, the preview pane often popped up when you didn't want it to do so, and that rarely happens with Cool Iris. So, if you are looking for something of the kind, Cool Iris will do the job for you.

23 December 2007

The "SCImago Influence Measure"

I mentioned the new SCImago journal ranking site a little while back and thought I would explore it a little further. In doing so, I find that the "Cites per doc" measure, which is given for one, two, three and four year periods might be called the 'SCImago Influence Measure' or 'SIM', since it is more or less equivalent to the Web of Knowledge Impact Factor. I prefer 'influence' to 'impact', since the latter is rather macho and percussive, while the former is much more subtle and, I think, more appropriate, since what we are talking about is the influence that a journal has within its field.

The four-year SIM is particularly interesting, I think, since it allows for a much longer period of time within which the documents have a possibility of being cited. Using the SCImago database to download the data also gives the opportunity for producing some interesting comparisons. The graph below shows the four-year SIM for a long-established journal, the Journal of Documentation, compared with three, now established, open access journals - Information Research, the Journal of Digital Information and the Journal of Electronic Publishing. It is striking that on this measure all three OA journals are now approaching the same level of 'influence' as the older journal. JEP has had some problems in maintaining publication, hence the dip in 2006, but with its future now established (I believe), I imagine that the growth in its influence will resume.

21 December 2007

Open access and Esposito - again

Joseph Esposito, whose article in The Scientist raised some OA hackles last month is at it again - and has been roundly answered by Alma Swan. Read them both for a comparison of ill-thought-out comment vs. sound rebuttal.

17 December 2007

Free Rice!

Let me recommend the online vocabulary improvement game, Free Rice. Not just a game, but a way of donating rice to the hungry of the Third World. For every word you get right, 20 grains of rice are donated. I got up to 2,400 grains before I packed in, so you can tell that it is rather addictive. There are some difficult words in there, but, very often you can select the right meaning because all of the other meanings appear to be wrong! What the 'vocabulary level' might be is not explained, but I guess it has something to do with the number of words you get right.
Currently donations stand at over nine thousand million (or nine billion in American) grains of rice donated. That sounds a lot, but it would be nice to see it translated into kilos.

16 December 2007

Get a life at Google

At the Official Google Blog, Aseem Sood, Product Manager, Google Toolbar Team, writes:

I've started to notice something peculiar about the Toolbar team, and that's this: we literally can't seem to stop carrying the Toolbar around with us. When we moved to a new space in our Mountain View campus, we brought along a hallway-sized printout of it. For Halloween, eighteen of us dressed up as the different parts of the Toolbar itself.

Oh, how sad! :-( No pumpkin lantern, no trick or treat, just dressing up as Toolbar elements! I think this is the saddest thing I've read this week.

The state of public libraries

Reading an old issue of the The Guardian Review I came across a piece by Alasdair Gray on the writing of his novel Lanark (started 1953, published 1981 - you can't say he rushed it!) - fortunately still available on the Website. It the piece he remarks:

The notion of Lanark and Thaw's stories being parts of the same book came from The English Epic and its Background by EMW Tillyard, published in 1954, discovered in Denniston public library. It astonishes me to think there was a time when the non-fiction shelves of libraries in working-class Glasgow districts had recently published books of advanced criticism!

Ah, yes - I remember those days. Sadly, the British public library has been in decline since Margaret Thatcher's romantic involvement with the market (continued by T. Blair and G. Brown) and the decline of any feeling in government for responsibility for the 'public sphere'. Once upon a time librarians from the Nordic countries used to visit Britain to see examples of the best in public library systems and services - all the traffic would have to be in the other direction today.

Another search engine

Aficionados of search engines might be interested in Carrot. This uses multiple search engines and then clusters the results by Topics, Sources and Sites. This is a demo site, but it seems to have possibilities.

15 December 2007

Another journal ranking measure

Biomedical Digital Libraries has a paper by William Barendse on "The strike rate index: a new index for journal quality based on journal size and the h-index of citations" The strike rate index (SRI) is 10logh/logN, where h is the h-index and N is the number of citeable papers published in the period covered.
The author argues:

The strike rate index appears to identify journals that are superior in their field and to allow different fields to be compared without recourse to additional data. A good way to select journals is to rank them within a narrow field on impact factor, then ask how difficult is it to get published in that journal, how respected is the editor and their staff, who else publishes in that journal, and how long does it take to get published. All of that is valid, but once the impact factor is reified into a universal measure of journal ranking, those other aspects are apt to be forgotten. When organizations or governments set universal thresholds based on the impact factor, it can be hard for individual scientists to argue against them. The strike rate index helps to address the gap in knowledge of the meta-data associated with the publishing of science, by looking at the long term record of a journal in publishing highly cited material relative to the number of articles published.

We now have at least four different ranking measures: the Impact Factor, probably the oldest and best known and often used by journal ranking sites at least as part of the ranking formula, the h-index (which produces oddities when applied to journals because of the age factor), the SCImago Journal Rank - which appears to produce a ranking very close to that produced by the Impact Factor (and which is a little problematical, since it produces ties), and now the Strike Rate Index - again, presumably because it uses the h-index with its age bias, produces a different ranking from the Impact Factor and the SCImago Journal Rank: for example, in the list I posted the other day Library Quarterly ranks 10th with the IF, =6th with SJR, 2nd with the SRI. Perhaps even more surprising is that JASIST, which ranks 2nd with both the IF and the SJR, ranks 10th with the SRI.

Take your pick - the assessment of 'quality' is always going to be problematical and one criterion (possibly the best?) - the acceptance/rejection rates of journals rarely gets released by publishers :-)

11 December 2007

SCImago Journal and Country Rank

News of a new journal ranking site from the SCImago research group at the University of Granada. Described as follows:

The SCImago Journal & Country Rank is a portal that includes the journals and country scientific indicators developed from the information contained in the Scopus® database (Elsevier B.V.). These indicators could be used to assess and analyze scientific domains.

This platform takes its name from the SCImago Journal Rank (SJR) indicatorpdf, developed by SCImago from the widely known algorithm Google PageRank™. This indicator shows the visibility of the journals contained in the Scopus® database from 1996.

A natural question for me, then, is: How does Information Research show up in this new ranking? So, I took the journals that are similar to Information Research, in that they are not 'niche' journals, but publish widely across information science, information management, librarianship, etc., from ISI's Journal Citation Reports and then gathered the data from SCImago. To reduce the effort of creating a table (not as easy in Blogger as it is in Free-Conversant) I have taken the top 10 journals from the list:
Journal h-index SJR cites/doc JIF Info & Mgt 29 0.069 3.65 2.119 Journal of ASIST 27 0.068 2.48 1.555 Info Pro & Mgt 27 0.058 2.11 1.546 J of Doc 23 0.058 1.61 1.439 Info Research 12 0.053 1.77 0.870 Lib & Info Sci Res 14 0.053 1.26 1.059 Int J Info Mgt 18 0.051 1.55 0.754 Lib Qly 14 0.051 1.23 0.528 J Info Sci 17 0.051 1.01 0.852 Lib Trends 14 0.050 0.85 0.545

The use of the h-index is well known in the bibliometrics fraternity and is normally used to measure the productivity and impact of an individual scholar. One of its problems, particularly significant in ranking journals, is that the longer the period in which the scholar (journal) has been active, the more likely it is that the scholar (journal) will receive a high h-index, so it's usefulness here may be limited. However, it is interesting to see that Information Research has an h-index of 12, while older journals have lower measures.

The SJR measure is explained as,

...an indicator that expresses the number of connections that a journal receives through the citation of its documents divided between the total of documents published in the year selected by the publication, weighted according to the amount of incoming and outgoing connections of the sources.

The 'cites/doc' measure is based the number of citations received in the previous four years and the total number of documents published in 2006.

JIF is the ISI Journal Impact Factor.

10 December 2007

DOAJ the biggest 'big deal'?

Heather Morrison suggests that the Directory of Open Access Journals now offers the biggest 'big deal' with, right now, 2996 journals listed.

But is it so? Many of the journals in DOAJ do not fit the model of the scholarly, peer-reviewed journal: for example, in the Library and Information Science area there are journals that are simply the bulletins of professional associations and it is difficult to discover whether or not the contributions are peer-reviewed.

Also, nothing stays still. I checked the eighty journals in the Library and Information Science area and found that thirteen had published nothing in 2007. Of these, two appeared to be completely dead (although one retained the archive of papers) and four had published nothing since 2005.

However, even if this pattern was repeated in other fields (and I suspect that this field might be more prone than others to the optimistic publishing of new journals) and, say, 15% of the journals were inactive this would still leave the DOAJ ahead of the field in the total number of journals 'packaged'. If 'quality' (however we measure it) is taken into account then perhaps another 5% would be suspect, but this would still leave DOAJ with more than 2,300 journals, compared with Science Direct's 2000.

One of the problems is that we still don't have a citation index that covers all OA journals - should SPARC and DOAJ look at that possibility as a further development of the already excellent service?

Universal Digital Library

Carnegie Mellon and the other universities, world-wide, are rightly receiving accolades in the press for the fact that the Universal Digital Library has exceeded its target by having digitised 1,500,000 volumes.

However, the UDL is not without its problems. For example, you need to download and install a viewer - either DjVu (of which I've heard) or the Tiff viewer - of which I'd never heard. Still, I downloaded the latter and then found that it appears to call up Quicktime actually to view the pages - all of them being images, rather than transcriptions. Needless to say, this is rather slow.

Also, I found that not all of the items are 100% open access. For example, I went looking for something on the history of Alsace (having a friend there who grows some excellent wines :-) and found that only 15% of Townroe's A Wayfarer in Alsace is actually available - and it ends in the middle of a chapter; in fact in the middle of a sentence!

Then there's the problem of blank pages being digitised. I located Hazen's Alsace-Lorraine under German Rule and found that the first seven pages were blank, so I skipped to the end and found blank pages from page 262 to 268. However, I persevered, and found text on page 261, so I skipped back to the start and found the start of the text on page 8. I image that others, not as determined as I might give up!

As for printing, you have a choice - you can print either the whole file or the current page and, because the files are all pictures, you can't select text for quoting. And there's no search function within a file.

Clearly digital library technology has a long way to go before it becomes user-friendly and the Universal Digital Library seems to have futher to go than many.

07 December 2007

Knowledge management?

I never cease to be amazed (and amused) by the lengths folks go to to justify the use of the term 'knowledfge management'. The latest is on the Science Commons blog where D. Wentworth seeks to answer the question, What’s “open source knowledge management”?:

'Knowledge management,' or KM, is a term often used by businesses to describe the systems they have for organizing, accessing and using information — everything from the data in personnel files to the number of products on store shelves.

Fair enough - that's what we've been calling 'information management' for about the past 40 years. But no!:

One reason that it’s “knowledge” management rather than “information” management is that the word knowledge connotes use of information, not just its availability. Having the ability to use information is what makes it valuable. One classic example is Wal-Mart, which used real-time data about its inventory to realize tremendous, game-changing efficiency gains and cost-savings.

Now what is it that 'information' does? Information - 'informs', in other words the notion of its use is implicit in the definition and its curious how no definition of 'km' can do without the notion of information. The only reason for the existence of information is that it should be used - calling information that is used, 'knowledge' is simply silly. As Peter Drucker famously said, 'Knowledge exists between two ears, and only between two ears.'

The blogger's ideas also ignore the fact that there are at least two other communities that use the term 'knowledge management': those building 'knowledge-based systems' in the AI fraternity; and those concerned with the more effective management of organizational communications through the creation of 'communities of practice' and similar ideas. When a term has such competing demands from totally different use communities it becomes worthless. I suggest that Science Commons should exercise a little 'scientific' commonsense and stick to 'information management'. When we look at the Neurcommons site (which is being blessed with this composite term), what do we find?

With this system, scientists will be able to load in lists of genes that come off the lab robots, and get back those lists of genes with relevant information around them based on the public knowledge. They’ll be able to find the papers and pieces of data where that information came from, much faster and more relevant than Google or a full text literature search, because for all the content in our system, we’ve got links back to the underlying sources. And they’ve each got an incentive to put their own papers into the system, or to make their corner of the system more accurate for the better the system models their research, the better results they’ll get.

In other words it's a database, constructed, it seems, using information extraction methods, which will deliver information items to the searcher.
It's a little difficult to understand what is meant by the following:

They’ll be able to find the papers and pieces of data where that information came from, much faster and more relevant than Google or a full text literature search, because for all the content in our system, we’ve got links back to the underlying sources.

What are those urls for each item retrieved by Google other than 'links back to the underlying sources'? And quite what 'the papers and pieces of data where that information came from' means is anyone's guess. It seems that once anyone gets into the mire of language associated with 'km' the critical faculty disappears altogether and hype prevails.