In a recent issue of Publisher’s Weekly.com, my friend Siva Vaidhyanathan characterized my support of the Google Books Project in ways that I must take issue with. (He also said many things that are insightful, wise and witty, and the whole interview is worth reading.)
Here’s the part that motivates this post:
PW: But Michigan librarian Paul Courant, for example, has argued passionately that Google’s books project offers great public benefits, making millions of long-lost books discoverable and accessible, work libraries could never have done so expeditiously. Doesn’t he have a point?
SV: I’m sympathetic to the expediency argument, but I’m also impatient with it. Courant’s argument is based on two assumptions that I take issue with. First is the assumption that the cost to university libraries would be low. We know now that the cost to libraries has actually been significant, and the benefit has been overstated. We also know now that Google wants to be a bookstore, not a library.
Second, the premise that no one else was ever going to do this is an argument by fiat, a classic fallacy. If we, the people of the world, the librarians of the world, the scholars of the world, the publishers of the world, decide that we should have a universal digital library, then let’s write a plan, change the laws, raise money, and do it right. If we’re going to create this with public resources, let’s do it in the public interest, not corporate interest. There’s nothing wrong with Google pursuing a books project, of course, and, yes, there are benefits. But we have to understand that what Google has created is first and foremost for Google, and I think a lot of people have fooled themselves about that.
I respond to Siva’s two points in order. First, my support for the project is not based on an assumption that the cost to university libraries would be low, but rather on a calculation that the costs are and have been substantially less than the benefits. The leadership of two dozen or so major research libraries seem to agree with me. The unsupported assertion that “we now know that the cost to libraries has been significant, and the benefit has been overstated” is, well, an unsupported assertion. For the University of Michigan Library, the cost has been material but not overwhelming. We have used staff time, organizational effort, and there has been some disruption of our activities. But the benefit has been far larger, in the form of making the content of our collections widely searchable, the public domain content readable by anyone, giving us a backup copy of our collections, and in seeding the HathiTrust, which is a cooperative digital library (not a bookstore) with some fifty-two academic libraries as members and a collection of over eight million volumes that is growing by tens of thousands of volumes a week. (Check out HathiTrust.org). Siva is welcome to discount these benefits, but first he should count them.
Second, although it’s an honor to be subject to a “classic fallacy” after all of these years as a college professor, I don’t quite see the fallacy. Just who, other than Google, has been willing to step up and do the job? And in what pre-Google fantasyland might we have expected the publishers of the world to show interest in making their backlists part of a universal digital library? And why would we believe that we can go to Congress and get improvements in copyright law when every time Congress touches copyright law it gets worse? (Indeed, I would suggest that the Google books project, by showing the value to millions of citizens of digital access to a large corpus of published work, is more likely to move Congress than the excellent public policy arguments that have been adduced, to deaf ears, by the likes of librarians, Siva, and myself.)
I continue to believe that had Google not embarked on this project, and showed the world that mass digitization of library collections could actually be done, we would still be counting the corpus of digitized work by tens of thousands instead of millions. To be sure, my assertion here is not subject to proof. We cannot know what would have happened had Google not gone into the scanning business. What we do know is that no one else was doing it. And no one else is doing it. That’s not a fallacy, but a fact. Actually, two facts. Maybe someone else would have done it. But when? I’m a lot older than Siva, so I’m the one who gets to be impatient when it comes to providing the riches of the world’s academic libraries to the people of the world.
Amen, Paul! I would add one more benefit: Goolge is teaching libraries to work at scale. HathiTrust is not only an impressive collection, but it builds muscles that libraries need if we are to cope with a digital age that is only in its infancy. We don’t trust Google to “not be evil.” We have worked hard to provide an alternate path for access to the material being generated. That work makes us stronger.
February 16, 2011 @ 8:04 am
I agree that only Google’s deep pockets could have possibly accomplished this result. I even debated on this topic in front of hundreds of librarians, saying that there was no way we could possibly digitize the Library of Congress, let alone all the libraries that Google has. I was of course completely wrong, because I did not forsee Google nor that it would care to invest in it. But now we are left with how to proceed in this brave new world, and I think the HathiTrust is exactly the way forward for libraries and all those who care deeply about the public trust.
February 16, 2011 @ 10:26 am
I do wonder about the costs, and how the cost is perceived by different staff. I know of some places where there’s at least a bit of a disconnect there; I’ve spoken to staff who talk in less than positive terms about the effects of the project in its treatment of books and in the impact on service at the time (although presumably that’s a short-term issue). I wonder how the lawsuits and settlement are going to turn out, and how that’s going to affect the partner libraries.
In some ways I’m even more sympathetic to the expediency argument than Siva is, particularly in light of just how library funding is being shortsidedly cut across the country.
There are a lot of things that are being lost right now due to preservation-related issues, and Google isn’t touching a lot of those yet (although I’ve heard they’re starting to contact other types of institutions about some of those cases). I’m interested in looking at which materials Google chose to digitize at different institutions.
February 16, 2011 @ 11:50 am
I think we might want to wait till Google hoists up the Jolly Roger before worrying about alternatives. Until then, it’s not as if Google is even in the running for top online boogie man. There’s just so much competition.
February 16, 2011 @ 12:06 pm
A cost factor that seems to be ignored in arguments surrounding book scanning is the cost of repairing damage caused in the scanning process. As a person who is passionate about the care, conservation and preservation of books, I feel the cost of digitization certainly has to include the cost of damage. In terms of the Google project this seems to be a highly secret area with precious little information made public. This sheds a negatively suspicious light on the whole undertaking. Other bookbinders and conservators ask me often about the level of damage to the physical books and about the quality of the actual scans that are being done. Rumors of extensive damage and poor scans abound, but no one seems to have any real data. I know the UM Library has and is examining this topic. Will any statistically valid data ever surface?
Given all of the preceding argument, a single bad flood undoubtedly does more damage than all of the scanning done to date. But that does not mitigate the damage-associated cost when scanning. A positive ROI could work in this business, but only when a monetary value can be placed on the benefit to researchers and others and only when the true, total cost of scanning and the business impacts on publishers and authors are placed in balance as well. I personally find it a shame that Google could not have approached this area with a bit less emphasis on a business model as opposed to offering a wonderful opportunity and service to the world. Perhaps Judge Chin would have ruled differently…
April 5, 2011 @ 1:27 pm
John,
Your point is of course valid, but it turns out that scanning is no harder on the books than reading, and has the advantage that we get a new copy that can be read repeatedly (subject to legal considerations) without damage, and also used to make a new copy if the original is no longer available. Here’s a comment on your comment, vetted for accuracy by our head of preservation and conservation:
Michigan undertook a number of periodic and focused analyses of possible damage to materials being digitized. These pre- and post-condition surveys pulled samples from the flow of scanned material without communicating with Google which volumes were being reviewed or when we were doing reviews. Each survey found, conclusively, that damage was infrequent and certainly no more than would be consistent with ordinary use of the materials. Results over time were also consistent. Later, when Michigan decided to begin digitization of special collections materials, we gave renewed and intensified scrutiny to handling and possible damage. We found that Google’s special handling of these materials limited damage and, again, that such damage as did occur was consistent with what would be expected from their being used.
Paul
April 10, 2011 @ 1:06 pm
Thanks Paul for the response. Perhaps a formal write-up of the testing that has been done (placed in one of the book conservation publications) would be useful to help clarify this. Participation by conservators at one or more other Google book project sites would be great confirmation and, perhaps, set some inquiring minds at ease.
April 10, 2011 @ 1:21 pm