On being in bed with Google

One of the things that surprises me most about reactions to the Google Library Project is that smart people whom I respect seem to think that the only reason that a university library would be involved with Google is because, in some combination, its leadership is stupid, evil, or at best intellectually lazy. To the contrary, although I may be proved wrong, I believe that the University of Michigan (and the other partner libraries) and Google are changing the world for the better. Four years from now, all seven million volumes in the University of Michigan Libraries will have been digitized – the largest such library digitization project in history. Google Book Search and our own MBooks collection already provide full-text access to well over a hundred thousand public domain works, and make it possible to search for keywords and phrases within hundreds of thousands more in-copyright materials. This access is altering the way that we do research. At least as important, the project is itself an experiment in the provision and use of digitized print collections in large research libraries. I do not see how we can discover the best ways to use such collections without experiments at this scale. In sum, I believe that our library is doing exactly what it should do in the best interests of scholarship and our users, now and in the future.

So I’m puzzled when people ask, “How could serious libraries be doing this? How could they abdicate their responsibilities as custodians of the world’s knowledge by offering their collections up as a sacrifice on the altar of corporate power? Why don’t they join the virtuous ranks of the Open Content Alliance partners, who pay thousands of dollars to digitize books at a rate of tens of thousands of volumes a year?” It seems like those who ask such questions have little appreciation of what Michigan and the other Google partners are actually up to.

Google is on pace to scan over 7 million volumes from U-M libraries in six years at no cost to the University. As part of our arrangement with Google, they give us copies of all the digital files, and we can keep them forever. Our only financial outlay is for storage and the cost of providing library services to our users. Anyone who searches U-M’s library catalog, Mirlyn, can access the scanned files via our MBooks interface. That’s right, anyone. (Copyright law constrains what we can display in full text, and what we can offer only for searching, but we share as much as we can consistent with prudent interpretations of the law.) For an example of an MBook, take a look at The Acquisitive Society by R. H. Tawney.

In a recent New York Times article about mass digitization projects, Brewster Kahle was quoted as saying: “Scanning the great libraries is a wonderful idea, but if only one corporation controls access to this digital collection, we’ll have handed too much control to a private entity.”

I agree with him. I’m an economist with a particular interest in public goods, which is how I came to be involved with libraries in the first place. Libraries have a long and honorable history of preserving information and making it accessible. Moreover, even at their best, for-profit institutions cannot be expected to serve general public interests when those interests run counter to those of their shareholders. So I would be distressed if a single corporation controlled access to the collections of the great academic libraries, just as I find it troubling, on a smaller scale, that a handful of publishers control access to much of the current scientific literature.

But Google has no such control. After Google scans a book, they return the book to the library (like any other user), and they give us a copy of the digital file. Google is not the only entity controlling access to the collection – the University of Michigan and other partner libraries control access as well. Except we don’t think of it as controlling access so much as providing it.

Since 2005, Siva Vaidhyanathan has been making and refining the argument that libraries should be digitizing their collections independently, without corporate financing or participation, and that those who don’t are failing to uphold their responsibility to the public. “Libraries should not be relinquishing their core duties to private corporations for the sake of expediency.”

“Expediency” is a bit of a dirty word. Vaidhyanathan’s phrase suggests that good people don’t do things simply because they are “expedient.” But I view large-scale digitization as expeditious. We have a generation of students who will not find valuable scholarly works unless they can find them electronically. At the rate that OCA is digitizing things (and I say the more the merrier and the faster the better) that generation will be dandling great-grandchildren on its knees before these great collections can be found electronically. At Michigan, the entire collection of bound print will be searchable, by anyone in the world, about when children born today start kindergarten.

Google brings to us extraordinary technical and computing power and tremendous financial resources. The libraries bring an understanding of our collections and our users, and a profound commitment to public access. We are not relinquishing our duties in the name of expediency; we are working with a capable partner to create a far more useful resource than we could create on our own. (Would I prefer that a charitable foundation would support this work on the same schedule as Google, and make everything available to everyone, subject only to copyright restrictions? You bet. I would prefer it even more if that foundation would buy out all of the rights holders for all out of print works. Can someone tell me the name of the foundation, please? In the meantime, it seems to me that being in bed with Google is way better than sleeping alone.)

It’s true that the digitized files from Google’s scans are often far from perfect. Historian Robert Townsend, Paul Duguid, and others have raised technical questions about the quality of Google’s scans, and their appropriateness for preservation. Those are important questions, and there is a great deal of work to be done, both by Google and by the libraries, before we consistently achieve the level of quality and bibliographic reliability that are essential to successful scholarly practice. I will discuss some of the specific steps we are taking to address quality in a future post, but for now I will just say that the solution of these problems will require the serious engagement of academic libraries, and that the visibility of the problems is essential to their solution. Mass digitization on the scale of the Google library project was unimaginable five years ago, and it comes as no surprise to me that we are learning a lot as we go long. We are learning in the tradition of serious academic work, by putting our ideas and our resources in the public eye, where they can be seen, and criticized, and improved.

Published November 4, 2007 & Filed in Google,Libraries,Mass digitization,Michigan

Quick response to Siva Vaidhyanathan »
« Beginning

51 Comments

Write comment - Trackback

Pingback from Intellectual debates in public forums | PomeRantz

[…] witnessed another intellectual debate with three moves recently: Paul Courant to Siva Vaidhyanathan to Paul Courant. Total elapsed time: two days. Two days! The […]

June 6, 2011 @ 3:59 pm

Au Courant