Communications of the ACM

[image of ACM logo]

April 1995 Introduction


April CACM Introduction:
Digital Networks

Guest Editors:
Edward A. Fox, Robert M. Akscyn, Richard K. Furuta, John J. Leggett


Because this article is so long, we have divided it onto two pages.
You may jump to any of these sections, or read the article in its entirety straight through.

This year marks the 50th anniversary of Vannevar Bush's seminal article (3 ) that paved the way for fields like information retrieval and hypertext. Bush's efforts helped raise the level of public support for scientific research at the end of World War II. Today we have a similar opportunity to invest in research and education to bring the world closer together, improve our environment, and fulfill the age-old dream of every human being: gaining ready access to humanity's store of information. This era and what we are building go by many names, including Cyberspace, Global Information Infrastructure, Infobahn, Information Age, Information (Super)Highway, Interspace, and Paperless Society. They are all supported by networking (e.g., the Internet). However, their essence is information. Information is what flows over the networks, what is presented to us by our consumer electronics devices, what is manipulated by our computers, and what is stored in our libraries.

Libraries exist in many forms and are of many types. In computing, code libraries have been a part of the world of software engineering. Object libraries are part of object-oriented programming efforts. With multimedia technology we now have image libraries, audio libraries, and even digital video libraries. We might also think of libraries when we refer to collections that now reside in databases, knowledge bases, text bases, gopherspace, or the World-Wide Web (WWW).

At the same time, traditional libraries' budgets devote an ever-increasing share of their funds to electronic services, whether in the form of CD-ROMs, online public access catalogs, or online databases. This trend will continue, as digital storage costs go down relative to the cost of library shelf-space, and as electronic services become more useful, affordable, available, and usable. Other forces are encouraging the demand for and supply of electronically accessible information, like the hunger for news and learning, the pride of authoring as evidenced by local or vanity publishing, the desire to collaborate or at least share with colleagues, the pressures of reorganization and restructuring, the ubiquitous presence of electronic publishing, the excitement of exploring in an expanding sea of information, and the push to use new technological tools. These are some of the many reasons that humanity is now working toward the grand challenge of a World Digital Library System. Here and throughout this special issue, we present a snapshot of that process.

History, Extreme Perspectives, and Definitions

For various reasons, digital library has stuck as the term to use for this field. Indeed, the majority of articles presented in this issue mention the phrase prominently. As we consider many of the discussions and activities in this area over the period 1991-1993, (5 ) we note a shift from electronic library to digital library as the preferred term, perhaps following the growing interest in digital networks, digital audio, and digital video relative to electronic publishing. In addition to a variety of other activities around the world, U.S. government legislation and a number of funding initiatives were launched in 1993 with the digital library as a prominent theme, and journal special issues began to appear on the topic (7 ).

In 1994 there were numerous talks, panels, tutorials, workshops (e.g., 10 ), and conferences (e.g., 9 ) on digital libraries. Discussions of digital libraries often begin by picking an extreme point on a spectrum or scale (see also Levy and Marshall in this issue). Yet digital libraries will simply shift the point of equilibrium in each of these scales. For example, some claim we are finally Eeyond Paper,Ewhich is the title of a recent work on the Adobe Acrobat product family (2 ). Unhappily (from an ecological perspective), increased use of computers has contributed to increased use of paper; but tools are finally emerging that may ameliorate that trend. Thus, CD-ROMs have eliminated much of the demand for printed encyclopedias, large reference works, and computer manuals; in one application of digital libraries to education there are already a number of paperless courses (see http://ei.cs.vt.edu/EIproj.html).

We hope use of paper will decrease but doubt it will disappear. Others claim we will no longer need printed books or journals (see also articles by Levy and Marshall and by Marchionini and Maurer). In spite of jokes about people being unwilling to curl up in bed with computers to read novels, Voyager does sell individual books on diskette for PowerBook computers, Project Gutenberg makes available a large number of out-of-copyright volumes, and many CD-ROMs contain large collections of books. More progress has been made with electronic journals, however. Initial and individual efforts to launch new journals as electronic services (e.g., the Online Journal of Clinical Trials by AAAS and OCLC) are giving way to large commercial ventures. For example, Elsevier's TULIP project, with some 40 journals about material science and engineering available as page images, will soon lead to electronic access to over a thousand of their journals.

Another extreme statement calls for elimination of intermediaries (search intermediaries, librarians, retailers, distributors, and others) who interfere in the process of interchange between authors and readers (whose roles will also be blurred as a universal hypertext library system evolves), as is considered by Wiederhold in this issue. Though the co-authors of this article are extensively involved in electronic and other publishing activities, and make significant personal use of hypertext systems (e.g., KMS 1 ), they all have a growing appreciation of the value of talented intermediaries and believe that considerable research is required to incorporate their knowledge into expert library systems.

Finally, some assume that with digital libraries everything will be in digital form (see Levy and Marshall). As in all of the cases mentioned here, we see a shift in the indicated direction that will certainly lead to dramatic changes, but must heed the lessons of history, which show us that new technologies rarely completely supplant the old, and that new points of balance eventually are achieved. The phrase "digital library" evokes a different impression in each reader. To some it simply suggests computerization of traditional libraries. To others, who have studied library science, it calls for carrying out of the functions of libraries in a new way, encompassing new types of information resources; new approaches to acquisition (especially with more sharing and subscription services); new methods of storage and preservation; new approaches to classification and cataloging, new modes of interaction with and for patrons; more reliance on electronic systems and networks; and dramatic shifts in intellectual, organizational, and economic practices.

To many computer professionals, a digital library is simply a distributed text-based information system (see article by Croft in this issue), a collection of distributed information services (see article by Wilensky), a distributed space of interlinked information (see Schatz et al.), or a networked multimedia information system. It may have materials that are mostly from outside an organization, that are generally of high value, and that have had special electronic services add to its quality during creation, collection, organization, and/or use 10 :"Networked information systems as digital libraries".

To modern-day users of the WWW it suggests more of the same, with sure-to-come improvements in performance, organization, functionality, and usability. Hypertext researchers recall Bush's vision of linked multimedia objects that encompass humanity's store of information (3 ). Those studying collaboration technologies see digital libraries as the space in which people communicate, share, and produce new knowledge and knowledge products. Those working on education technology see digital libraries as support for learning, whether formal or informal (see Marchionini and Maurer).

The metaphor of the traditional library is both empowering and constraining. We have acknowledged the value of talented intermediaries and recognize the importance of the knowledge systems they have evolved over centuries of handling and managing traditional collections. Much of the power of the digital library is the flexibility it permits in allowing processing of our collections of tangible objects and their electronic representations. However, the knowledge developed over the years is quite flexible too, and it is feasible, perhaps even desirable, to apply it also to the collection of things without direct physical analogs, for example, algorithms, real-time data feeds, computational states, relationships among versions of a physical object showing the historical progression of an idea, multimedia annotations, and tours.

In this issue we embrace all of these perspectives. We adopt the pragmatic approach of letting exemplary pilot, research, and development projects provide an operational definition; having descriptions of key supporting technologies provide insight into future trends; and offering a set of in-depth feature articles that attempt to explain digital libraries in the light of interface and retrieval techniques, education, needs of information analysts, and the world of scholarly publishing.

In this Issue

We have developed this special issue as a nourishing meal, with appetizers, main courses, and desserts aplenty. We invite you to indulge in all of the appealing works, to think deeply about their implications, to connect them with your own plans and activities, and to build your own view of digital libraries.

We are pleased that toward the end of this issue the ACM Publications Board has laid out its plans for electronic publishing (Denning and Rous), along with interim statements about copyright policies and guidelines for authors who submit works for ACM to publish. These statements expand on earlier discussions (e.g., 4) and will have a direct impact on you, the computing profession, and the world of electronic publishing. What is particularly exciting in light of this Special Issue is that the cornerstone of ACM Publications is its emerging digital library, which will support a wide range of new as well as replacement services. We hope that this issue will help prepare you for the new world of ACM publishing!

We open with our most visually interesting article, which shifts the focus to users and visualizing information. Rao et al. discuss a variety of tools, systems, and studies at Xerox PARC that illustrate the future style of rich interaction users will have with digital libraries. They provide a human-computer interaction perspective on the field of information retrieval. As in the Envision project (see Heath et al.), they strive to empower users to manage the vast amount of information that will be available in digital libraries, but go further in applying visualization to complex processes like clustering and in incorporating interaction into more aspects of user sessions.

Having introduced a number of approaches to information retrieval and visualization, we continue with a selection of short pieces about supporting technologies for digital libraries. Bell et al. discuss an information retrieval system available by anonymous FTP and how compression techniques make it suitable for handling large text collections. Croft describes how information retrieval methods, enhanced with artificial intelligence techniques, can be used in digital libraries. Fox summarizes a range of efforts to make computer science reports readily available, leading into French et al.'s discussion of the WATERS project and Lagoze and Davis's explanation of the Dienst system. Please avail yourselves of these computer science report services, and help build and use an integrated worldwide virtual computer science report library. Finally, concluding this section is a short piece on a powerful approach to handle spatial (e.g, GIS) information (by Kacmar and Jue).

The section on projects spans the range of pilot digital library efforts from small to large. Huser et al. describe technologies and approaches to constructing a large encyclopedia. These will be a crucial part of future digital libraries: developing a knowledge base and object database with the aid of text analysis and parsing, network editing and enrichment, and automatic generation of presentations on demand. Merrill et al. explain a 135-GB university information system exploiting CD-ROM technology. Heath et al. describe a user-centered discipline-level research project to build a digital library prototype from the ACM literature, and present a new style of interface for managing search results (see also 7 ). Entlich et al. give an overview of the CORE project to build a digital library in chemistry from American Chemical Society publications.

Continuing this section is a collection of pieces regarding the NSF/ARPA/NASA $24.4-million Digital Library Initiative. First is Olson's En Appreciation of Laurence Rosenberg, whose many years of devoted service to NSF ended while he was working on this initiative. Short overviews of the six projects funded through this initiative follow, listed in alphabetical order by institution name. These cover a broad range of digital library content areas and types, technical approaches, system architectures, and research problems. For each, pointers are given to project home pages on the WWW, so readers can track progress over the four year period of each award. The last discussions in this section deal with national libraries. Purday details the efforts of the British Library to apply digital technology. The several pilot initiatives described will pave the way for broader access to the vast holdings of this key institution. Becker then briefly summarizes efforts under way at the U.S. Library of Congress. The activities at these two libraries are illustrative of national efforts in France, Japan, Singapore, and other countries.

The feature articles presented here give definition and perspective to the field of digital libraries. Since most people think of education when libraries are discussed, we begin with Marchionini and Maurer's coverage of teaching and learning with digital libraries. These authors are experienced in library and information science, electronic publishing and journals, hypertext systems and collections, information resources, evaluation, and using scientific data in education. They provide a rich insight not only into digital libraries but also into the future of learning.

The next two articles force us to think deeply about digital libraries, how they relate to current work practices, traditional libraries, and the revolution now under way in electronic publishing. Levy and Marshall focus our attention on information workers and their needs, making it clear what support digital libraries must provide and why we will need ongoing bridges among digital libraries, objects in the real world, and people's communicative processes. They help us enlarge our view of digital libraries to include facilitation of richer collaboration, varied media, and works that are more transitory than archival publications. Wiederhold pushes us even further, to consider the whole enterprise of scholarly and electronic publishing. He points out the new capabilities that digital libraries will provide and the implications of networked information interchange for the world of publishing, and predicts shifts in roles, responsibilities, revenues, and services.

Concluding the issue are the ACM Electronic Publishing articles, which illustrate many of the points raised earlier. Denning and Rous continue Wiederhold's analysis of shifts in the role of publishers. They summarize current practice in scientific publishing, list the many breakdowns now visible, and lay out ACM's response to these challenges: a digital library, two tracks of publications, experiments, new services, and careful development of policies and guidelines.


Continue reading this article.
Return to the April 1995 contents.
jc@cs.brandeis.edu
jamesf@cs.brandeis.edu