|
FLHIG Home page |
List of Online Articles | Newsletter contents | Site map | Links |
| This article first appeared in SIGGNL 15 pages 14 to 17 | |||||
|
Why the
Internet needs input from Librarians |
|||||
|
|
I have a confession to make: I'm a librarian. Now, I've never actually worked in a library and I don't wear my hair in a bun, but I do have a Master's degree in Information and Library Studies from the University of Michigan's School of Information. In addition, many of my friends are librarians and I'm getting married to a librarian in a couple of months. So you'll understand when I proclaim that librarians are destined for greatness on the Internet, that I'm a little biased in my opinions. The greatness of which I'm speaking relates to the development of tools for finding information on the Internet. To date, most of the information retrieval tools on the Net have been designed by people with a computer science background. Don't get me wrong. I don't want to offend any programmers. First of all, the Internet itself wouldn't even exist if it wasn't for the creative talents of programmers from around the world. Second, and perhaps most important, librarians and other non-programmers such as myself depend heavily upon the skills and good will of programmers to carry out our grand schemes in this highly technical environment. We need programmers! |
||||
|
|
However, anyone who has spent much time looking for information in this chaotic and distributed environment will agree that finding useful information on the Internet is no easy task. Phrases such as drowning in information and finding a needle in a haystack come to mind. Searching on the Net can be difficult, frustrating, and very time consuming. You've probably heard the saying that to someone with a hammer in his hand, all problems look like nails. Well, on the Internet, computer science folks tend to hit information retrieval problems over the head with relevance ranking algorithms, intelligent agents, and lightning quick processors. AltaVista is a perfect example of a fast, powerful, and highly automated search tool, and for many types of searches it works wonders. However, the complexity of the query language and the sheer volume of information can be overwhelming. A search on WebReview, for example, returns 900,000 documents. That's a few too many for me to start sifting through. |
||||
|
|
There are ways to refine a query, and I'll get to those in a moment, but even the expert searcher can't get around the problem of the ambiguity of language. Without understanding the context of a query, it is difficult for a search tool to understand exactly what you're looking for. Do you want shoe polish or Polish shoes? Are you looking for biology resources for the professional researcher or for sixth graders? Do you only want good information? How do you define "good" information? |
||||
|
|
I happen to believe that the people best equipped to solve some of these information retrieval problems are librarians. We might not be very strong in the marketing department and we're certainly not very good at turning great ideas into revenue-generating products and services, but we have invested a substantial amount of time and energy into studying the information-seeking behavior of real people in the real world and developing skills and tools to meet those information needs. To best state my case, I'd like to show you a loose categorization scheme I've developed for information retrieval tools on the Internet which include automated search tools, Internet directories, and virtual libraries. |
||||
|
|
For reference purposes, you might want to take a look at the Internet Searching Center, a resource that collocates some of the Internet's most useful and popular information retrieval tools. Automated search tools comprise the richest and most varied category of tools. These search tools employ software robots and spiders that crawl the Web, indexing everything they find. Examples include AltaVista and Lycos. The highly automated nature of these tools allows them to provide access to the most comprehensive indices of Internet resources available. AltaVista's database, for instance, contains 15 billion words indexed from over 30 million Web pages. However, the weaknesses of search tools also derive from their automated nature. First, since search tools do not employ organization schemes, the onus for sorting the information falls to the user. To refine our search for Web Review magazine using AltaVista, we might enter title: "web review" as our query phrase. This requires a knowledge of the query language and an understanding of the principles of online searching that many novice users lack. Second, because search tools exercise no editorial control over the resources they index, the quality of information varies widely. Search tools make no distinction between "good" and "bad" information. As the volume of information on the Net continues to grow exponentially, both of these problems will become increasingly troublesome. General purpose search tools will not scale well at all. Internet directories or resource collections are fairly comprehensive and easy to use. Examples include Yahoo and Open Market's Commercial Sites Index. Anyone can add resources to a directory. With Yahoo, for instance, you can submit a resource with a brief description. Yahoo's staff then integrates it into a subject hierarchy. In this way, Internet directories balance central control with distributed independence, thereby melding the efforts of human and machine. With several million potential contributors, the strength of directories clearly lies in their ability to be comprehensive and current. On the other hand, the weaknesses of directories also derive from their distributed independence. Because anyone can add resources, everyone does. This results in information of varying quality. In addition, the self-submission of resource descriptions and evaluations can be problematic. Who isn't going to hype up their resource, at least a little? A search in Yahoo on the word "great" returns 523 hits, "excellent" returns 306 hits, and "lousy" only 6 hits. Either Yahoo is packed with amazing high quality resources or we've got a problem with objectivity. Like search tools, it is difficult to see how these general purpose directories are going to scale over time. Already, Yahoo's subject hierarchy is close to collapsing under its own weight. There are just too many menus and levels in the hierarchy. As Yahoo doubles and quadruples in size, even the search results screens will become unwieldy. With a billion dollars in market capitalization, give or take a few million, I'm sure the folks at Yahoo will come up with a few creative solutions to these problems, but I doubt they'll solve the underlying issues caused by the ambiguity of language. That's where us librarians come in. Virtual libraries or value-added collections of Internet resources are among the more civilized areas of an otherwise chaotic and unruly cyberspace. Although a far cry from the order and stability of traditional libraries, virtual libraries do provide a taste of the value that librarians can add to the Internet through the application of traditional skills in a vastly non-traditional environment. Through the identification, selection, organization, description, and evaluation of Internet information resources, digital librarians or cybrarians create virtual libraries which help people to find "good" information. My favorite virtual library, not surprisingly, is the Clearinghouse for Subject Oriented Internet Resource Guides [located at www.clearinghouse.net; thank you to Kath Willans for this information - Ed]. While neither the name nor the search engine are as zippy as Yahoo's, the topical guides within the Clearinghouse can serve as an excellent tool for finding useful information. Take a look at the biotechnology guide for example. The author has identified, described, evaluated, and organized many of the best biotechnology resources on the Internet. All the hard work has been done for you. And it's free! |
||||
|
|
|
The strength of virtual libraries clearly derives from the process through which real people add value to the raw information environment of the Net. The weaknesses stem from the limited number of guide authors, many of whom have day jobs and maintain their guides as a free service to the Internet community. They simply can't keep up with the volume of information and rapid speed of change. For these reasons, virtual libraries tend to be less comprehensive and less current than directories and automated search tools. Now if we could find a way to compensate guide authors for their efforts, perhaps we could dramatically improve the quality of guides and subsequently the effectiveness of virtual libraries as Internet information retrieval tools. So what does all this mean? What is the future of searching on the Internet? Why are librarians poised for global domination? Well, there are a few clear trends that we can see. First, the volume of information on the Internet is increasing exponentially, doubling every six to eight months. Second, the sheer volume of information is leading people to spend more and more time using information retrieval tools on the Net. They desperately need these tools to find what they're looking for. Third, entrepreneurs and investors alike have recognized these trends. The initial public offerings of companies like Yahoo and Lycos testify to the gold rush fever in this area. In my opinion, the highly automated "one size fits all" approaches to indexing the Internet are doomed to failure as the volume and diversity of resources continue to expand. Instead, people will come to depend upon the audience or subject specific value-added guides to be found in virtual libraries. These guides will cut through the clutter and provide a sense of organization and perspective in a way that no automated tools or intelligent agents ever will. Since librarians are well suited to the development of these guides, we are positioned to capitalize on the growing demand for value-added information services. Maybe on the Internet, librarians will finally figure out how to make some money while helping to make the world a better place. Then again, maybe not, but at least it's worth a try. |
|||
|
|
|||||
|
| |||||
|
Page updated 16 February 2005 |
|||||
|
|
|||||