[This is a reposting of a comment I made in response to Siva Vaidhyanathan's questions about my previous post. I am traveling, and can only produce brief answers to his questions now. Later this week I'll get to most of the issues in more detail here.]
Let me start by reminding everyone that I do not speak for Google, nor am I engaged in generalized cheerleading on Google’s behalf. Rather, I am arguing that the University of Michigan Library is doing a Good Thing in its digitization project with Google.
Below are Siva’s questions, and my responses:
He dismisses serious search problems as temporary, yet fails to confront the problem that Google cannot and will not explain the factors and standards that put one book above another in search results.
Actually, I don’t mention search at all in my post. Nor (see above) do I speak for Google.
As users discover poorly-scanned files on the Google index, how can they alert Google to the problem? Why does nothing in the contract between Michigan and Google include quality-control standards or methods?
Please see Michigan’s agreement with Google, clause 2.4, the relevant part of which reads: “U of M will engage in ongoing review (through sampling) of the resulting digital files, and shall inform Google of files that do not meet benchmarking guidelines or do not comply with the agreed-upon format. Should U of M encounter a persistent failure by Google to meet these guidelines or supply the agreed-upon format, U of M may stop new work until this failure can be rectified.” The agreement is online at: http://www.lib.umich.edu/mdp/umgooglecooperativeagreement.html
How do we know this index will last for decades? What image file system is Google using and what ensures its preservation?
I believe that in my post I said that the UM library (like other partner libraries) is also storing and preserving the files that Google scans. Maybe Google won’t last for decades, but the libraries will, and the libraries are pretty serious about preservation.
How is the “library copy,” that electronic file that Michigan and others receive as payment for allowing Google to exploit their treasures, NOT an audacious infringement of copyright? It violates both the copyright holder’s right to copy and right to distribute. Doesn’t a university library have an obligation to explain this?
It’s hard to get past the first premise of this set of questions. One literal answer would be to say that there is no such electronic file, because Google is not obtaining anything by means of exploitation.
I must say that I am troubled that the author of a very sensible book about copyright is so enthusiastic about trashing Google that he is willing to give up on the uses, notably scholarly uses, that are permitted in the higher-numbered sections of the Copyright Act. As my institution’s copyright lawyer says: “FAIR USE, it’s the law.” And my institution believes that when we have Google digitize our holdings we do so under the law and in order to make uses that are not only lawful, but that are completely consistent with the undergirding purpose of copyright law.
Siva is much younger than I am, so he may be willing to wait decades before finding out how scholarship and society can benefit from digitized and searchable collections from some of the world’s great libraries. For myself, I’d like to unleash my colleagues and our students on this remarkable resource while I’m still around to see what happens.
Finally, re Ryan Shaw’s post, yes, we receive the OCR.