<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Au Courant &#187; Google</title>
	<atom:link href="http://paulcourant.net/category/google/feed/" rel="self" type="application/rss+xml" />
	<link>http://paulcourant.net</link>
	<description>Paul Courant's blog about libraries, economics, public policy, and other stuff</description>
	<lastBuildDate>Sat, 24 Apr 2010 22:30:27 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Digitization and accessibility</title>
		<link>http://paulcourant.net/2009/11/02/digitization-and-accessibility/</link>
		<comments>http://paulcourant.net/2009/11/02/digitization-and-accessibility/#comments</comments>
		<pubDate>Mon, 02 Nov 2009 15:49:21 +0000</pubDate>
		<dc:creator>pnc</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Mass digitization]]></category>

		<guid isPermaLink="false">http://paulcourant.net/?p=73</guid>
		<description><![CDATA[From the very beginning, one of the most exciting possibilities of the Google Digitization Project was its potential to open up vast stores of text to a group of users to whom it had previously been inaccessible: people with visual impairments and print disabilities. Before Google (B.G.), students and scholars who wanted access to the [...]]]></description>
			<content:encoded><![CDATA[<p>From the very beginning, one of the most exciting possibilities of the Google Digitization Project was its potential to open up vast stores of text to a group of users to whom it had previously been inaccessible: people with visual impairments and print disabilities. Before Google (B.G.), students and scholars who wanted access to the contents of a print book had to request that the book be converted to braille, or digitized and OCRed, a special one-at-a-time process that took several weeks. This required lots of advance planning and significantly slowed the pace of study and research for these users. After Google (A.G.), with an increasing amount of the total published content in the world available digitally, that tedious process is no longer necessary. Students and scholars with print disabilities can experience the flow of moving from resource to resource without impediment for the first time ever. Here&#8217;s how.</p>
<p>Over the past couple of years, The University of Michigan Library has been working on a project to improve the accessibility of our digitized texts for visually impaired UM students, staff, and faculty. First, the team made accessibility improvements to the standard public interface for the <a href="http://hathitrust.org" target="_blank">HathiTrust Digital Library</a> (formerly known as MBooks) and developed a text-only interface geared toward people with print disabilities that is optimized for screen reading software. Next, and most important, the Library figured out how to grant access to the full text of digitized books for qualified patrons, regardless of the book&#8217;s copyright status.</p>
<p>Like many other universities, the UM Services for Students with Disabilities (SSwD) has long offered book digitization service to students with disabilities upon request. This is explicitly allowed under <a href="http://www.copyright.gov/title17/92chap1.html#121" target="_blank">section 121 of U.S. Copyright law</a>.</p>
<p>Our new system basically does the same thing but on a much larger scale. The HathiTrust Digital Library currently provides access to over 4 million digitized volumes and will grow to over 10 million &#8211; visually impaired students will have full-text access to all of these volumes. We consider this just the beginning. Over the next year, we will continue to work on improvements to the interface and conduct more user assessments, and our HathiTrust partners are working together to create a framework through which we can offer this service to users at their institutions.</p>
<p>Once a University of Michigan student registers with the UM Services for Students with Disabilities any time she checks out a book that has been digitized, she will automatically receive an email with a URL. Once the student selects the link, she will be asked to login. The system will check to see whether the student is registered with SSwD as part of this program, and ensure that she has checked out this particular book. If the student passes both of those tests, she will get access to the entire full-text of the book, whether it is in copyright or not, in an interface that is optimized for use with screen readers. The Library’s Blog for Library Technology has <a href="http://mblog.lib.umich.edu/blt/archives/2009/10/hathitrust_acce.html">more details on the technical elements</a> for those who are interested.</p>
<p>Our system was endorsed by the National Federation for the Blind as a model for how libraries can serve visually impaired patrons in the digital age. It’s a great example of how digital technologies can extend the ability of libraries to serve their clients, and to extend learning and teaching beyond traditional populations.</p>
<p>As with the production of so many things that are of broad public value, this work could not have happened without the efforts of a committed champion.  I am pleased to recognize the commitment and skill of Jack Bernard, an attorney in Michigan’s Office of the General Counsel, who has provided substantive and legal leadership in our efforts to make our collections accessible to our students with print disabilities.  Jack’s efforts were recognized by the American Library Association, which gave him the L. Ray Patterson Copyright Award in 2009.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulcourant.net/2009/11/02/digitization-and-accessibility/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Economist and the librarian-economist on the Google settlement</title>
		<link>http://paulcourant.net/2009/09/07/the-economist-and-the-librarian-economist-on-the-google-settlement/</link>
		<comments>http://paulcourant.net/2009/09/07/the-economist-and-the-librarian-economist-on-the-google-settlement/#comments</comments>
		<pubDate>Mon, 07 Sep 2009 12:53:40 +0000</pubDate>
		<dc:creator>pnc</dc:creator>
				<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Economics]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Mass digitization]]></category>

		<guid isPermaLink="false">http://paulcourant.net/?p=57</guid>
		<description><![CDATA[The current issue of The Economist has a leader supporting the Google settlement and an article in the business section that quotes me in the course of discussing the issue.  I am described, with my enthusiastic consent, as running an orphanage.  The more I think of it the better the orphan metaphor works. [...]]]></description>
			<content:encoded><![CDATA[<p>The current issue of <em>The Economist</em> has a <a href="http://www.economist.com/opinion/displaystory.cfm?story_id=14363287">leader</a> supporting the Google settlement and an <a href="http://www.economist.com/businessfinance/displayStory.cfm?story_id=14391317">article</a> in the business section that quotes me in the course of discussing the issue.  I am described, with my enthusiastic consent, as running an orphanage.  The more I think of it the better the orphan metaphor works.  Orphan works are orphans of a particular type &#8212; foundlings.  They are not orphaned by a premature loss of their parents.  They are left on the doorstep, taken in (by the library, of course, in the role of the tough but kind orphanage staff), nurtured and kept for as long as care is needed. They may have parents out there and they may not, no one knows.  And now there is some hope that they will  be invited to the dance, and we shall see how the story plays out.</p>
<p>The Economist interviewed me about the settlement at some length, and made a <a href="http://audiovideo.economist.com/?fr_story=d3ce48202fea23fe7595380f38e7914547ad0b45&amp;rf=bm">podcast</a> that I quite like.  It recapitulates fairly painlessly (it&#8217;s 13 minutes) some of things that I&#8217;ve been saying about the Google lawsuit and settlement for some time.</p>
<p>And, for something completely different and arguably more important, Paul Krugman has a superb piece entitled <a href="http://www.nytimes.com/2009/09/06/magazine/06Economic-t.html?_r=1&amp;scp=1&amp;sq=Krugman%20magazine&amp;st=cse">&#8220;How Did Economists Get It So Wrong&#8221;</a> in the New York Times Magazine of September 6.  What&#8217;s remarkable is how economists got it so wrong 70 years after Keynes got it so right.  Anyhow, this is a testimonial for Krugman&#8217;s piece from an admiring economist.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulcourant.net/2009/09/07/the-economist-and-the-librarian-economist-on-the-google-settlement/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Orphan Works Legislation and the Google Settlement</title>
		<link>http://paulcourant.net/2009/03/15/orphan-works-legislation-and-the-google-settlement/</link>
		<comments>http://paulcourant.net/2009/03/15/orphan-works-legislation-and-the-google-settlement/#comments</comments>
		<pubDate>Sun, 15 Mar 2009 23:59:10 +0000</pubDate>
		<dc:creator>pnc</dc:creator>
				<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Mass digitization]]></category>
		<category><![CDATA[Publishing]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://paulcourant.net/?p=51</guid>
		<description><![CDATA[I spent Friday at a fascinating conference  at the Columbia University Law School, on the subject of (what else?) the Google settlement.  Lead counsel from all three parties, lots of other lawyers, several princpals, publishers, authors and librarians were there.
I learned something important that at some level I already knew.
The most important single [...]]]></description>
			<content:encoded><![CDATA[<p>I spent Friday at a fascinating conference  at the Columbia University Law School, on the subject of (what else?) the Google settlement.  Lead counsel from all three parties, lots of other lawyers, several princpals, publishers, authors and librarians were there.</p>
<p>I learned something important that at some level I already knew.</p>
<p>The most important single thing about the Google settlement, simultaneously its greatest achievement and among its most vexing features, is the treatment of orphaned works (in James Grimmelman’s witticism, “zombie” works).  The problem, as we all know, is that there are millions – no one quite knows how many – of works that may or may not be in copyright and for which the rightsholder(s) may or may not exist and may or may not be aware of their rights.  Our ability to use these works is thus much compromised: we run the risk that a copyright holder will appear and claim damages.  As we all know, Congress’s efforts to make it easier and safer to use orphaned works have failed.  Moreover, the most recent draft legislation would have imposed difficult and costly burdens on a potential user by requiring the would-be user to make substantial efforts to find any potential but unknown rightsholder.</p>
<p>Along comes the Google settlement, which solves at least part of the problem, for Google and the Book Rights Registry, at one fell swoop.  (Only part of the problem, because works that were not registered with the copyright office will likely not be in the settlement and yet may be just as orphaned as those that are registered.)  Under the settlement, revenues generated by orphaned works will be held in escrow for for five years, allowing time for a rightsholder to come forward.  It’s a moving window; if the rightsholder comes forward in year 22, she gets revenues from year 17 on.  Thus the products that Google sells to individuals and institutions can include, among other works, millions of orphans (zombies).  Without the orphans, the great public benefit of the settlement – the ability to find and use much of the literature of the 20th century in digital form – would be much diminished.</p>
<p>At the same time, the disposition of the revenues attributed to orphaned works is one of my least favorite parts of the settlement.  The unclaimed revenues go first to support the operations of the BRR, and then, after that, will be used for charitable purposes consistent with the interests of publishers and authors.  As the head of a library that has lovingly cared for these works for decades, the notion that the fruits of our labors (and those of many others in many libraries) redound to the benefit of entities that did not write, publish, or curate these works sticks a bit in my craw.  So I hope that authors, publishers, the court, and the public will be vigilant in making sure the BRR does not squander the unclaimed revenues on mismanagement, high salaries, and the like.   The “charitable purposes” should be an objective, not a remainder for unclaimed funds.</p>
<p>The settlement also gives Google and the BRR, and no one else, the right to use the orphaned works in this way.  A number of commentators, have noted problems that may arise from Google’s privileged position in this regard.  But there is an obvious solution, one that was endorsed at the Columbia meeting by counsel for the Authors Guild, the AAP, and Google:  Congress could pass a law, giving access to the same sort of scheme that Google and the BRR have under the Google Settlement to anyone.  And they could pass some other law that makes it possible for people to responsibly use orphaned works, while preserving interests for the missing “parents” should they materialize.  Jack Bernard and Susan Kornfield have proposed <a href="http://www.copyright.gov/orphan/comments/OW0613-Kornfield.pdf">just such an architecture </a>to “foster” these orphans. Google has also made a <a href="http://www.copyright.gov/orphan/comments/OW0681-Google.pdf">proposal</a> that would be a huge improvement.</p>
<p>Given that the parties to the suit, libraries, and the public would all benefit from such legislation, it should be a societal imperative to pass it.  I look forward to AAP, the Authors Guild, and Google lobbying and testifying in favor of such legislation.  I’d be happy to be there, too.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulcourant.net/2009/03/15/orphan-works-legislation-and-the-google-settlement/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Google, Robert Darnton, and the Digital Republic of Letters</title>
		<link>http://paulcourant.net/2009/02/04/google-robert-darnton-and-the-digital-republic-of-letters/</link>
		<comments>http://paulcourant.net/2009/02/04/google-robert-darnton-and-the-digital-republic-of-letters/#comments</comments>
		<pubDate>Wed, 04 Feb 2009 13:11:51 +0000</pubDate>
		<dc:creator>pnc</dc:creator>
				<category><![CDATA[Economics]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Mass digitization]]></category>

		<guid isPermaLink="false">http://paulcourant.net/?p=40</guid>
		<description><![CDATA[Robert Darnton recently published an essay in the New York Review of Books on the Google settlement.  There has been much commentary in blogs, listserves, and print media.  Below I reproduce a letter that I sent to the New York Review of Books, that they found to be far too long to publish. [...]]]></description>
			<content:encoded><![CDATA[<p>Robert Darnton recently published an <a href="http://www.nybooks.com/articles/22281">essay in the New York Review of Books</a> on the Google settlement.  There has been much commentary in blogs, listserves, and print media.  Below I reproduce a letter that I sent to the New York Review of Books, that they found to be far too long to publish. It is my understanding that they expect to publish a much-shortened revision.  In any case, here&#8217;s what I had to say.</p>
<p>&#8212;&#8211;</p>
<p>To the editors:</p>
<p>My colleague and friend Robert Darnton is a marvelous historian and an elegant writer. His utopian vision of a digital infrastructure for a new Republic of Letters (Google and the Future of Books, NYRB Feb. 12) makes the spirit soar.  But his idea that there was any possibility that Congress and the Library of Congress might have implemented that vision in the 1990s is a utopian fantasy.  At the same time, his view of the world that will likely emerge as a result of Google’s scanning of copyrighted works is a dystopian fantasy.</p>
<p>The Congress that Darnton imagines providing both money and changes in law that would have made out-of-print but in-copyright works (the great majority of print works published in the 20th century) digitally available on reasonable terms showed no interest in doing anything of the kind.  Rather, it passed the Digital Millennium Copyright Act and the Sonny Bono Copyright Term Extension Act. (More recently, Congress passed the Higher Education Opportunity Act, which compels academic institutions to police the electronic environment for copyright infringement). This record is unsurprising; the committees that write copyright law are dominated by representatives who are beholden to Hollywood and other rights holders.  Their idea of the Republic of Letters is one in which everyone who ever reads, listens, or views pretty much anything should pay to do so, every time.</p>
<p>The Supreme Court, which was given the opportunity to limit the extension of the term of copyright, which was already far too long (like Darnton, I think that 14 years renewable once is more than enough to achieve the purposes of copyright) refused to do so (with only two dissenters) in Eldred v. Ashcroft, decided in 2003. Instead, it upheld legislation that, contrary to the fundamental principles of copyright, provided rewards to authors who are long dead, preventing our cultural heritage from rising into the public domain,</p>
<p>In short, over the last decade and more, public policy has been consistently worse than useless in helping to make most of the works of the 20th century searchable and usable in digital form.   This is the alternative against which we should evaluate Google Book Search and Google’s settlement with publishers and authors.</p>
<p>First, we should remember that until Google announced in 2004 that it was going to digitize the collections of a number of the world’s largest academic libraries, absolutely no one had a plan for mass digitization at the requisite scale. Well-endowed libraries, including Harvard and the University of Michigan, were embarked on digitization efforts at rates of less than ten thousand volumes per year.  Google completely shifted the discussion to tens of thousands of volumes per week, with the result that overnight the impossible goal of digitizing (almost) everything became possible.  We tend to think now that mass digitization is easy.  Less than five years ago we thought it was impossibly expensive.</p>
<p>The heart of Darnton’s dystopian fantasy about the Google settlement follows directly from his view that “Google will enjoy what can only be called a monopoly … of access to information.”  But Google doesn’t have anything like a monopoly over access to information in general, nor to the information in the books that are subject to the terms of the settlement. For a start (and of stunning public benefit in itself) up to 20% of the content of the books will be openly readable by anybody with an Internet connection, and all of the content will be indexed and searchable.  Moreover, Google is required to provide the familiar “find it in a library” link for all books offered in the commercial product.  That is, if after reading 20 percent of a book a user wants more and finds the price of on-line access to be too high, the reader will be shown a list of libraries that have the book, and can go to one of those libraries or employ inter-library loan.  This greatly weakens the market power of Google’s product.  Indeed, it is much better than the current state affairs, in which users of Google Book Search can read only snippets, not 20% of a book, when deciding whether what they’ve found is what they seek.</p>
<p>Darnton is also concerned that Google will employ the rapacious pricing strategies used by many publishers of current scientific literature, to the great cost of academic libraries, their universities, and, at least as important, potential users who are simply without access.  But the market characteristics of current articles in science and technology are fundamentally different from those of the vast corpus of out-of-print literature that is held in university libraries and that will constitute the bulk of the works that Google will sell for the rights holders under the settlement agreement.   The production of current scholarship in the sciences requires reliable and immediate access to the current literature.  One cannot publish, nor get grants, without such access.  The publishers know it, and they price accordingly.  In particular the prices of individual articles are very high, supporting the outrageously expensive site licenses that are paid by universities.  In contrast, because there are many ways of getting access to most of the books that Google will sell under the settlement, the consumer price will almost surely be fairly low, which will in turn lead to low prices for the site licenses.  Again, “find it in a library,” coupled with extensive free preview, could not be more different than the business practices employed by many publishers of scientific, technical and medical journals.</p>
<p>There is another reason to believe that prices will not be “unfair”, which is that Google is far more interested in getting people to “google” pretty much everything than it is in making money through direct sales.  The way to get people to come to the literature through Google is make it easy and rewarding to do so.  For works in the public domain, Google already provides free access and will continue to do so. For works in the settlement, a well-designed interface, 20 percent preview, and reasonable prices are all likely to be part of the package. Additionally, libraries that don’t subscribe to the product will have a free public terminal accessible to their users.  This increases the public good deriving from settlement both directly and by providing yet another distribution channel that does not require payment to Google or the rightsholders.</p>
<p>The settlement is far from perfect.  The American practice of making public policy by private lawsuit is very far from perfect.  But in the absence of the settlement – even if Google had prevailed against the suits by the publishers and authors – we would not have the digitized infrastructure to support the 21st century Republic of Letters.  We would have indexes and snippets and no way to read any substantial amount of any of the millions of works at stake on line.  The settlement gives us free preview of an enormous amount of content, and the promise of easy access to the rest, thereby greatly advancing the public good.</p>
<p>Of course I would prefer the universal library, but I am pretty happy about the universal bookstore. After all, bookstores are fine places to read books, and then to decide whether to buy them or go to the library to read some more.</p>
<p>Paul N. Courant</p>
<p>Note: This letter represents my personal views and not those of the University of Michigan, nor any of its libraries or departments.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulcourant.net/2009/02/04/google-robert-darnton-and-the-digital-republic-of-letters/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>The Google Settlement &#8211; From the Universal Library to the Universal Bookstore</title>
		<link>http://paulcourant.net/2008/10/28/the-google-settlement-from-the-universal-library-to-the-universal-bookstore/</link>
		<comments>http://paulcourant.net/2008/10/28/the-google-settlement-from-the-universal-library-to-the-universal-bookstore/#comments</comments>
		<pubDate>Tue, 28 Oct 2008 15:52:13 +0000</pubDate>
		<dc:creator>pnc</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Mass digitization]]></category>

		<guid isPermaLink="false">http://paulcourant.net/?p=39</guid>
		<description><![CDATA[If you think about it, a universal bookstore is a pretty cool idea.  Bookstores are wonderful things. Anyone can walk into bookstore, take a book off a shelf, read in it, decide whether to buy it or forget about it, or get it from the library.  The settlement announced today by Google, the [...]]]></description>
			<content:encoded><![CDATA[<p>If you think about it, a universal bookstore is a pretty cool idea.  Bookstores are wonderful things. Anyone can walk into bookstore, take a book off a shelf, read in it, decide whether to buy it or forget about it, or get it from the library.  The <a href="http://books.google.com/googlebooks/agreement/">settlement announced today</a> by Google, the Association of American Publishers, and the Authors Guild will in time make it possible for millions of books, currently out of print and in-copyright, to be perused, searched and purchased (or not) in an electronic bookstore that will be operated by Google.</p>
<p>The books will come from a number of academic libraries, including the University of Michigan, the University of California, and Stanford University, which have been participants Google Book Search from the beginning, These three worked with Google during the settlement negotiations in an effort to shape the settlement to serve the interests of research libraries and the public, as discussed in a <a href="http://www.ns.umich.edu/htdocs/releases/story.php?id=6807">joint press release.</a></p>
<p>The settlement is complicated, and as people work through it I expect a lively set of discussions and I invite comment on this blog and elsewhere.  I’d like to start with what I see as a couple of key points.</p>
<p>First, and foremost, the settlement continues to allow the libraries to retain control of digital copies of works that Google has scanned in connection with the digitization projects.  We continue to be responsible for our own collections.   Moreover, we will be able to make research uses of our own collections.  The huge investments that universities have made in their libraries over a century and more will continue to benefit those universities and the academy more broadly.</p>
<p>Second, the settlement provides a mechanism that will make these collections widely available.  Many, including me, would have been delighted if the outcome of the lawsuit had been a ringing affirmation of the fair use rights that Google had asserted as a defense. (My inexpert opinion is that Google’s position would and should have prevailed.)  But even a win for Google would have left the libraries unable to have full use of their digitized collections of in-copyright materials on behalf of their own campuses or the broader public.  We would have been able, perhaps, to show snippets, as Google has being doing, but it would have been a plain violation of copyright law to allow our users full access to the digitized texts.  Making the digitized collections broadly usable would have required negotiations with rightsholders, in some cases book by book, and publisher by publisher.  I’m confident that we would have gotten there in time, serving the interests of all parties.  But “in time” would surely have been many years, and the clock would have started only at the end of a lawsuit that had many years left to run.  Moreover, each library would have had to negotiate use rights to its own collection, still leaving us a long way from a collection of digitized collections that we could all share.</p>
<p>The settlement cuts through this morass.  As the product develops, academic libraries will be able to license not only their own digitized works but everyone else’s.  Michigan’s faculty and students will be able to read Stanford and California’s digitized books, as well as Michigan’s own.   I never doubted that we were going to have to pay rightsholders in order to have reading access to digitized copies of works that are in-copyright.  Under the settlement, academic libraries will pay, but will do so without having to bear large and repeated transaction costs.  (Of course, saving on transaction costs won’t be of much value if the basic price is too high, but I expect that the prices will be reasonable, both because there is helpful language in the settlement and because of my reading of the relevant markets.)</p>
<p>The settlement is not perfect, of course.  It is reminiscent, however, of the original promise of the Google Book project: what once looked impossible or impossibly distant now looks possible in a relatively short period of time.  Faculty, students, and other readers will be able to browse the collections of the world ‘s great libraries from their desks and from their breakfast tables.  That’s pretty cool.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulcourant.net/2008/10/28/the-google-settlement-from-the-universal-library-to-the-universal-bookstore/feed/</wfw:commentRss>
		<slash:comments>52</slash:comments>
		</item>
		<item>
		<title>&#8220;Less than perfect&#8221; is not always bad</title>
		<link>http://paulcourant.net/2008/10/21/less-than-perfect-is-not-always-bad/</link>
		<comments>http://paulcourant.net/2008/10/21/less-than-perfect-is-not-always-bad/#comments</comments>
		<pubDate>Tue, 21 Oct 2008 12:03:13 +0000</pubDate>
		<dc:creator>pnc</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Mass digitization]]></category>

		<guid isPermaLink="false">http://paulcourant.net/?p=38</guid>
		<description><![CDATA[In a recent paper prepared for the Boston Library Consortium, Richard Johnson decries the fact that some mass digitization arrangements between libraries and corporations have been &#8220;less than perfect.&#8221;
The choices that we face are indeed less than perfect.  We can choose purity and perfection, and not permit any restrictions on the use of scans [...]]]></description>
			<content:encoded><![CDATA[<p>In a <a href="http://www.blc.org/news/BLC_summit_white_paper_9-29-08.pdf">recent paper</a> prepared for the Boston Library Consortium, Richard Johnson decries the fact that some mass digitization arrangements between libraries and corporations have been &#8220;less than perfect.&#8221;</p>
<p>The choices that we face are indeed less than perfect.  We can choose purity and perfection, and not permit any restrictions on the use of scans of public domain material, with the result that the rate of scanning and consequent display will be pitifully slow. Or we can permit corporate entities, including dreaded Google, to scan our works, enabling millions of public domain works to be made available to readers on line, at no cost to the readers, in a relatively short period of time. I am on record by word and deed as preferring the second choice.</p>
<p>In his paper, Johnson notes that the original works are retained by the libraries and could be scanned again.  He fails to note that libraries whose PD works are scanned by Google get to keep a copy of the scans and are free to display them on line, independent of Google Book Search. Over 300,000 public domain works can be found in the University of Michigan catalog and read on line.  The number grows by thousands per week.  Of course I would prefer it if the digital files could be used without restriction. Would someone please tell me the name of the entity that stands ready to digitize our collections, for free, without restriction on the use of the digital files?  In the meantime, it seems to me that making the books available to readers online makes for a better world, albeit, sadly, not a perfect one.</p>
<p>And, this just in, an <a href="http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2101/2037">article by Kalev Leetaru in First Monday</a> that compares Google Book Search and the Open Content Alliance and finds much that is both good and less than perfect in both.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulcourant.net/2008/10/21/less-than-perfect-is-not-always-bad/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Microsoft Exits the Mass Digitization Business</title>
		<link>http://paulcourant.net/2008/05/31/microsoft-exits-the-mass-digitization-business/</link>
		<comments>http://paulcourant.net/2008/05/31/microsoft-exits-the-mass-digitization-business/#comments</comments>
		<pubDate>Sat, 31 May 2008 15:30:21 +0000</pubDate>
		<dc:creator>pnc</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Mass digitization]]></category>
		<category><![CDATA[Michigan]]></category>

		<guid isPermaLink="false">http://paulcourant.net/?p=33</guid>
		<description><![CDATA[Last week Microsoft announced that it will cease its Live Search program and the associated programs of mass digitization that it has been undertaking with many libraries.  The response in the library world has generally been one of resigned sadness that the only big player other than Google is getting out of the free [...]]]></description>
			<content:encoded><![CDATA[<p>Last week Microsoft <a href="http://blogs.msdn.com/livesearch/archive/2008/05/23/book-search-winding-down.aspx">announced</a> that it will cease its Live Search program and the associated programs of mass digitization that it has been undertaking with many libraries.  The response in the library world has generally been one of resigned sadness that the only big player other than Google is getting out of the free (to the libraries) mass digitization business. From an article in the <a href="http://chronicle.com/free/2008/05/3022n.htm?utm_source=at&amp;utm_medium=en  ">Chronicle of Higher Education</a>:</p>
<blockquote><p>&#8220;Microsoft was a little slower off the mark than Google,&#8221; says Anne R. Kenney, university librarian at Cornell University. Her library has supplied both Microsoft and Google with books and articles for digitization. &#8220;It would have meant an awful lot of additional investment in this area for Microsoft to be a real competitor.&#8221;</p></blockquote>
<p>In the same article, I am quoted as saying &#8220;The more the merrier. I don&#8217;t like a monopoly, and I like it when there&#8217;s lots of money behind an extremely important project.&#8221;   I continue to wish that there were folks with deep pockets lining up to provide free digitization of the world’s library collections. Alas, there is no one in line that I know of, and with Microsoft&#8217;s departure, the only serious player is Google.</p>
<p>Speaking of Google, (as I find myself doing rather frequently) a recent posting on <a href="http://arstechnica.com/news.ars/post/20080526-why-killing-live-book-search-is-the-right-thing-for-ms.html ">Ars Technica </a>includes  the following remark, which is misleading in several ways:</p>
<blockquote><p>If people think that corporations are the right way to access the history of human discourse, [Brewster] Kahle says they&#8217;re in for &#8220;a series of very rude shocks.&#8221; (The University of Michiagn (sic), which has thrown in its lot with Google, does not agree.)</p></blockquote>
<p>I want to <a href="http://paulcourant.net/2007/11/04/on-being-in-bed-with-google/">emphasize, yet again</a>, that I completely agree with Brewster Kahle that it would be a very bad thing if a single corporation were in control of the cultural record.  Indeed, it would be bad if, as is the case with much of audio and video, the control were divided up amongst several corporations.  Nonprofit organizations, emphatically including research libraries, are the natural stewards of information that will be of value to society for the indefinite future, precisely because we are driven by a mission of preservation and access, rather than by profit.  Good thing, then, that the University of Michigan and other universities whose collections are being digitized by Google continue to hold the original copies of their print works, and also receive and preserve copies of the image files and associated text files that are produced by Google’s nondestructive scanning of these works.</p>
<p>I will miss Microsoft, and I hope that others will take its place – again, the more the merrier. In the meantime, the University of Michigan Library now has well <a href="http://www.lib.umich.edu/news/millionth.html">over a million digitized books</a> in its catalogue, with the number growing by thousands every day. Visit us online at www.lib.umich.edu.  Our catalog will allow search of all of the digitized works, and full view of those that are in the public domain.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulcourant.net/2008/05/31/microsoft-exits-the-mass-digitization-business/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Teaching School</title>
		<link>http://paulcourant.net/2007/11/12/teaching-school/</link>
		<comments>http://paulcourant.net/2007/11/12/teaching-school/#comments</comments>
		<pubDate>Tue, 13 Nov 2007 01:52:52 +0000</pubDate>
		<dc:creator>pnc</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Mass digitization]]></category>

		<guid isPermaLink="false">http://paulcourant.net/2007/11/12/teaching-school/</guid>
		<description><![CDATA[Paul Duguid’s comment on an earlier post of mine gets to important issues that I expect to discuss repeatedly (although not repetitiously) in this space. Among the big questions that he raises are these two: (1) How good a job will Google Book Search do? (2) What are the consequences that flow from the answer [...]]]></description>
			<content:encoded><![CDATA[<p>Paul Duguid’s <a href="http://paulcourant.net/2007/11/04/on-being-in-bed-with-google/#comment-16">comment</a> on an earlier post of mine gets to important issues that I expect to discuss repeatedly (although not repetitiously) in this space. Among the big questions that he raises are these two: (1) How good a job will Google Book Search do? (2) What are the consequences that flow from the answer to (1)?</p>
<p>I can’t answer the first question. Thus far GBS has not done well with multivolume works, sort of like iTunes with classical music. In both cases, metadata is thrown away, and the results are often more amusing than useful. Library partners, including Michigan, have been on Google’s case about this for some time. Duguid asks whether Google will learn from Michigan. My experience is that in general Google is very good at learning.</p>
<p>I have more to say about the second question. Here I am optimistic. Suppose Google never gets good at multivolume works, and in this and possibly other domains falls well short of good performance in delivering to users what they are looking for. I find it very unlikely that such a circumstance would be sustained, because Google has a strong interest in being responsive to its users. So the outcome will turn on how discerning the users will be, and on that subject colleges and universities and their libraries should have a great deal to say. What matters is whether academic libraries and their associated colleges and universities are able to teach their students well enough so that students can tell the difference between good search outcomes and misguided ones. (We also need to teach our students how to recognize sources with reliable provenance, and how to use such sources in order to make sense of their own and others’ arguments, but that is a longer discussion for another time.)</p>
<p>If we (academic institutions) do our job well, users will not tolerate unreliable search outcomes, and in that case I would expect Google to be responsive, not because libraries have told them how to catalog books, but because users will find books that are ill-cataloged to be less useful than books that are well-cataloged. By using the Google-scanned works well in our teaching and research, we can develop practices of scholarly literacy that use authenticated and reliable digital sources. GBS may be the direct source of the works, or we may rely on the library copies. Either way, the important job for academic institutions is to teach well (or, more precisely, to assure that their students to learn well) and that is exactly as it should be.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulcourant.net/2007/11/12/teaching-school/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Quick response to Siva Vaidhyanathan</title>
		<link>http://paulcourant.net/2007/11/06/quick-response-to-siva-vaidhyanathan/</link>
		<comments>http://paulcourant.net/2007/11/06/quick-response-to-siva-vaidhyanathan/#comments</comments>
		<pubDate>Tue, 06 Nov 2007 23:04:23 +0000</pubDate>
		<dc:creator>pnc</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Mass digitization]]></category>
		<category><![CDATA[Michigan]]></category>

		<guid isPermaLink="false">http://paulcourant.net/2007/11/06/quick-response-to-siva-vaidhyanathan/</guid>
		<description><![CDATA[[This is a reposting of a comment I made in response to Siva Vaidhyanathan's questions about my previous post. I am traveling, and can only produce brief answers to his questions now. Later this week I'll get to most of the issues in more detail here.]
Let me start by reminding everyone that I do not [...]]]></description>
			<content:encoded><![CDATA[<p>[This is a reposting of a comment I made in response to <a href="http://www.googlizationofeverything.com/2007/11/paul_courant_of_michigan_addre.php">Siva Vaidhyanathan's questions about my previous post</a>. I am traveling, and can only produce brief answers to his questions now. Later this week I'll get to most of the issues in more detail here.]</p>
<p>Let me start by reminding everyone that I do not speak for Google, nor am I engaged in generalized cheerleading on Google&#8217;s behalf. Rather, I am arguing that the University of Michigan Library is doing a Good Thing in its digitization project with Google.</p>
<p>Below are Siva&#8217;s questions, and my responses:</p>
<blockquote><p>He dismisses serious search problems as temporary, yet fails to confront the problem that Google cannot and will not explain the factors and standards that put one book above another in search results.</p></blockquote>
<p>Actually, I don&#8217;t mention search at all in my post. Nor (see above) do I speak for Google.</p>
<blockquote><p>As users discover poorly-scanned files on the Google index, how can they alert Google to the problem? Why does nothing in the contract between Michigan and Google include quality-control standards or methods?</p></blockquote>
<p>Please see Michigan&#8217;s agreement with Google, clause 2.4, the relevant part of which reads: &#8220;U of M will engage in ongoing review (through sampling) of the resulting digital files, and shall inform Google of files that do not meet benchmarking guidelines or do not comply with the agreed-upon format. Should U of M encounter a persistent failure by Google to meet these guidelines or supply the agreed-upon format, U of M may stop new work until this failure can be rectified.&#8221; The agreement is online at: <a href="http://www.lib.umich.edu/mdp/umgooglecooperativeagreement.html" rel="nofollow">http://www.lib.umich.edu/mdp/umgooglecooperativeagreement.html</a></p>
<blockquote><p>How do we know this index will last for decades? What image file system is Google using and what ensures its preservation?</p></blockquote>
<p>I believe that in my post I said that the UM library (like other partner libraries) is also storing and preserving the files that Google scans. Maybe Google won’t last for decades, but the libraries will, and the libraries are pretty serious about preservation.</p>
<blockquote><p>How is the &#8220;library copy,&#8221; that electronic file that Michigan and others receive as payment for allowing Google to exploit their treasures, NOT an audacious infringement of copyright? It violates both the copyright holder&#8217;s right to copy and right to distribute. Doesn&#8217;t a university library have an obligation to explain this?</p></blockquote>
<p>It&#8217;s hard to get past the first premise of this set of questions. One literal answer would be to say that there is no such electronic file, because Google is not obtaining anything by means of exploitation.</p>
<p>I must say that I am troubled that the author of a very sensible book about copyright is so enthusiastic about trashing Google that he is willing to give up on the uses, notably scholarly uses, that are permitted in the higher-numbered sections of the Copyright Act. As my institution&#8217;s copyright lawyer says: &#8220;FAIR USE, it&#8217;s the law.&#8221; And my institution believes that when we have Google digitize our holdings we do so under the law and in order to make uses that are not only lawful, but that are completely consistent with the undergirding purpose of copyright law.</p>
<p>Siva is much younger than I am, so he may be willing to wait decades before finding out how scholarship and society can benefit from digitized and searchable collections from some of the world&#8217;s great libraries. For myself, I&#8217;d like to unleash my colleagues and our students on this remarkable resource while I&#8217;m still around to see what happens.</p>
<p>Finally, re Ryan Shaw&#8217;s post, yes, we receive the OCR.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulcourant.net/2007/11/06/quick-response-to-siva-vaidhyanathan/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>On being in bed with Google</title>
		<link>http://paulcourant.net/2007/11/04/on-being-in-bed-with-google/</link>
		<comments>http://paulcourant.net/2007/11/04/on-being-in-bed-with-google/#comments</comments>
		<pubDate>Sun, 04 Nov 2007 19:08:52 +0000</pubDate>
		<dc:creator>pnc</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Mass digitization]]></category>
		<category><![CDATA[Michigan]]></category>

		<guid isPermaLink="false">http://paulcourant.net/2007/11/04/on-being-in-bed-with-google/</guid>
		<description><![CDATA[One of the things that surprises me most about reactions to the Google Library Project is that smart people whom I respect seem to think that the only reason that a university library would be involved with Google is because, in some combination, its leadership is stupid, evil, or at best intellectually lazy. To the [...]]]></description>
			<content:encoded><![CDATA[<p>One of the things that surprises me most about reactions to the Google Library Project is that smart people whom I respect seem to think that the only reason that a university library would be involved with Google is because, in some combination, its leadership is stupid, evil, or at best intellectually lazy. To the contrary, although I may be proved wrong,  I believe that the University of Michigan (and the other partner libraries)  and Google are changing the world for the better. Four years from now, all seven million volumes in the University of Michigan Libraries will have been digitized – the largest such library digitization project in history. Google Book Search and our own MBooks collection already provide full-text access to well over a hundred thousand  public domain works, and make it possible to search for keywords and phrases within hundreds of thousands more in-copyright materials. This access is altering the way that we do research. At least as important, the project is itself an experiment in  the provision and use of digitized print collections in large research libraries.   I do not see how we can discover the best ways to use such collections without experiments at this scale. In sum, I believe that our library is doing exactly what it should do in the best interests of scholarship and our users, now and in the future.</p>
<p>So I’m puzzled when people ask, “How could serious libraries be doing this? How could they abdicate their responsibilities as custodians of the world’s knowledge by offering their collections up as a sacrifice on the altar of corporate power? Why don’t they join the virtuous ranks of the Open Content Alliance partners, who pay thousands of dollars to digitize books at a rate of tens of thousands of volumes a year?”  It seems like those who ask such questions have little appreciation of what Michigan and the other Google partners are actually up to.</p>
<p>Google is on pace to scan over 7 million volumes from U-M libraries in six years at no cost to the University. As part of our arrangement with Google, they give us copies of all the digital files, and we can keep them forever. Our only financial outlay is for storage and the cost of <a href="http://www.lib.umich.edu/mdp/">providing library services</a> to our users. Anyone who searches U-M’s library catalog, <a href="http://mirlyn.lib.umich.edu">Mirlyn</a>, can access the scanned files via our MBooks interface. That’s right, anyone. (Copyright law constrains what we can display in full text, and what we can offer only for searching, but we share as much as we can consistent with prudent interpretations of the law.)  For an example of an MBook, take a look at <a href="http://mdp.lib.umich.edu/cgi/pt?id=mdp.39015009366819" title="The Acquisitive Society by R. H. Tawney."><em>The Acquisitive Society</em> by R. H. Tawney</a>.</p>
<p>In a recent <a href="http://www.nytimes.com/2007/10/22/technology/22library.html?em&amp;ex=1193544000&amp;en=ce927953c53a4745&amp;ei=5087%0A">New York Times article</a> about mass digitization projects, Brewster Kahle was quoted as saying: “Scanning the great libraries is a wonderful idea, but if only one corporation controls access to this digital collection, we’ll have handed too much control to a private entity.”</p>
<p>I agree with him. I’m an economist with a particular interest in public goods, which is how I came to be involved with libraries in the first place.  Libraries have a long and honorable history of preserving information and making it accessible.  Moreover, even at their best, for-profit institutions cannot be expected to serve general public interests when those interests run counter to those of their shareholders.  So I would be distressed if a single corporation controlled access to the collections of the great academic libraries, just as I find it troubling, on a smaller scale, that a handful of publishers control access to much of the current scientific literature.</p>
<p>But Google has no such control.  After Google scans a book, they return the book to the library (like any other user), and they give us a copy of the digital file. Google is not the only entity controlling access to the collection – the University of Michigan and other partner libraries control access as well. Except we don’t think of it as controlling access so much as providing it.</p>
<p>Since 2005, Siva Vaidhyanathan has been making and refining <a href="http://www.sivacracy.net/2005/11/siva_in_chronicle_of_higher_ed.html">the argument</a> that libraries should be digitizing their collections independently, without corporate financing or participation, and that those who don’t are failing to uphold their responsibility to the public. “Libraries should not be relinquishing their core duties to private corporations for the sake of expediency.”</p>
<p>“Expediency” is a bit of a dirty word.  Vaidhyanathan’s phrase suggests that good people don’t do things simply because they are “expedient.” But I view large-scale digitization as expeditious. We have a generation of students who will not find valuable scholarly works unless they can find them electronically.  At the rate that OCA is digitizing things (and I say the more the merrier and the faster the better) that generation will be dandling great-grandchildren on its knees before these great collections can be found electronically.  At Michigan, the entire collection of bound print will be searchable, by anyone in the world, about when children born today start kindergarten.</p>
<p>Google brings to us extraordinary technical and computing power and tremendous financial resources. The libraries bring an understanding of our collections and our users, and a profound commitment to public access. We are not relinquishing our duties in the name of expediency; we are working with a capable partner to create a far more useful resource than we could create on our own.  (Would I prefer that a charitable foundation would support this work on the same schedule as Google, and make everything available to everyone, subject only to copyright restrictions?  You bet.  I would prefer it even more if that foundation would buy out all of the rights holders for all out of print works.  Can someone tell me the name of the foundation, please?   In the meantime, it seems to me that being in bed with Google is way better than sleeping alone.)</p>
<p>It’s true that the digitized files from Google’s scans are often far from perfect. Historian <a href="http://www.historians.org/Perspectives/issues/2007/0709/0709vie1.cfm">Robert Townsend</a>, <a href="http://www.firstmonday.org/issues/issue12_8/duguid/index.html">Paul Duguid</a>, and others have raised technical questions about the quality of Google’s scans, and their appropriateness for preservation. Those are important questions, and  there is a great deal of work to be done, both by Google and by the libraries, before we consistently achieve the level of quality and bibliographic reliability that are essential to successful scholarly practice. I will discuss some of the specific steps we are taking to address quality in a future post, but for now I will just say that the solution of these problems will require the serious engagement of academic libraries, and that the visibility of the problems is essential to their solution. Mass digitization on the scale of the Google library project was unimaginable five years ago, and it comes as no surprise to me that we are learning a lot as we go long.  We are learning in the tradition of serious academic work, by putting our ideas and our resources in the public eye, where they can be seen, and criticized, and improved.</p>
]]></content:encoded>
			<wfw:commentRss>http://paulcourant.net/2007/11/04/on-being-in-bed-with-google/feed/</wfw:commentRss>
		<slash:comments>51</slash:comments>
		</item>
	</channel>
</rss>
