In a recent paper prepared for the Boston Library Consortium, Richard Johnson decries the fact that some mass digitization arrangements between libraries and corporations have been “less than perfect.”
The choices that we face are indeed less than perfect. We can choose purity and perfection, and not permit any restrictions on the use of scans of public domain material, with the result that the rate of scanning and consequent display will be pitifully slow. Or we can permit corporate entities, including dreaded Google, to scan our works, enabling millions of public domain works to be made available to readers on line, at no cost to the readers, in a relatively short period of time. I am on record by word and deed as preferring the second choice.
In his paper, Johnson notes that the original works are retained by the libraries and could be scanned again. He fails to note that libraries whose PD works are scanned by Google get to keep a copy of the scans and are free to display them on line, independent of Google Book Search. Over 300,000 public domain works can be found in the University of Michigan catalog and read on line. The number grows by thousands per week. Of course I would prefer it if the digital files could be used without restriction. Would someone please tell me the name of the entity that stands ready to digitize our collections, for free, without restriction on the use of the digital files? In the meantime, it seems to me that making the books available to readers online makes for a better world, albeit, sadly, not a perfect one.
And, this just in, an article by Kalev Leetaru in First Monday that compares Google Book Search and the Open Content Alliance and finds much that is both good and less than perfect in both.
It would be a real shame if people begin to take the Leetaru piece in First Monday as authoritative. It is riddled with errors, outdated information, and misinterpretations. I am a Berkeley anthropology PhD student doing research at the Internet Archive. I study both the Internet Archive’s book project carefully and am aware of much of the public information about Google’s project. Leetaru is wrong about how many books have been scanned by the Archive, the accessbility of books on its site, the cameras and software used in scanning, the supposed restrictions on use of scanned works, metadata practices, site searchability, and more. It would take a very long document to adequately rebut the piece.
Finally, his overall interpretation that somehow Google’s practices are more transparent than those of the Internet Archive is intriguingly counterintuitive but it doesn’t gibe with the facts (as I know them, at least). What’s worst about the piece is that it is essentially unfair to a small organization that is doing some pretty extraordinary things under very constrained circumstances. I wish he’d taken more time to understand the Internet Archive/OCA side of things. Had he, I think his conclusions would have been different.
October 21, 2008 @ 3:08 pm
[…] Books and the Open Content Alliance comparant Google Book Search et Open Content Alliance sur Au Courant, critiqué en […]
October 22, 2008 @ 4:44 am
[…] Au Courant and Chronicle of Higher […]
October 22, 2008 @ 11:58 am
> We can choose purity and perfection,
> and not permit any restrictions on
> the use of scans of public domain material
i’m surprised you would mention “restrictions”
concerning scans of public-domain material…
there are none. a scan of a public-domain book
is public-domain as well, per bridgeman v corel…
your lawyers may disagree. google’s too. fine.
sue me. it will tell the world where you stand…
but that’s all beside the point… because — a la
the long-time wisdom of michael hart — a scan
of a book is not an e-book. we want digital text.
digital text is far superior — in _every_ way — to a
scan-set; it’s more flexible and more informative,
and uses less bandwidth. the important question
facing digitization projects at this point in time is
how to correct the o.c.r. to get perfect digital text.
once we have that, the scans will just collect dust.
as head of one of the main libraries in the world
— a big leader in the move toward cyberlibraries —
you should know this, so it is astonishing when
you blog and fail to manifest such knowledge…
-bowerbird
October 22, 2008 @ 2:05 pm
Paul’s post points out the now undeniable benefits of the Google Books effort and other mass digitization projects. To position themselves to best exploit those benefits, libraries need as much information about the projects as they can get. Leetaru’s comparative study, though dated (isn’t everything?) is extremely useful in this regard.
November 13, 2008 @ 6:50 pm