E-Books and P-Books

December 29th, 2007

Like everyone else who follows the blogs and listserves that everyone else follows, over the past month or so I have had the opportunity to skim thousands of comments on the new Amazon Kindle. I haven’t actually played with a Kindle, yet, but if ever a subject were well covered by the secondary literature, this is it, so I feel fully qualified to comment on the matter. (This in the spirit of Pierre Bayard’s recent How to Talk About Books that You Haven’t Read, which I have played with.)

The Kindle is plainly many wonderful things, and does many wonderful things, and, for most purposes, is a pretty poor substitute for a book. (At the same time, for some purposes, such as carrying a substantial library on a long trip, or augmenting that library at 4AM from a hotel room in a strange land, or getting the best price on some content from Amazon, it’s much better than a book.)

I acquired a Sony Reader about a year ago, and I like it just fine, although if I have time, space, and carrying capacity, I invariably prefer a book. When I first played with the Sony I thought that pretty soon now, there would be readers that would make e-books very good substitutes for p-books. A year or two and lots of development costs later, I’m not so sure. Put simply, what is most striking about the buzz around the Kindle is that (almost) no one is saying that it is a revolutionary, next generation improvement over its predecessor. It’s better at some things, has a much better interface for actually acquiring content, and so on. It’s wow, but not “WOW, I’m going to throw away my library and convert the space into a billiard room.”

Here’s an instructive contrast – JSTOR. When JSTOR made the back issues of the leading economics journals available digitally, I did throw away part of my library and repurpose the space. JSTOR made it possible for me to skim and read any article in the relevant journals. Of course, even then if I was really going to read the article, I would print it out, the better for carrying around and making marginal notes.

As we all know, electronic versions of academic journals have been very successful, and most academic libraries are now choosing the electronic form in preference to print for a large fraction of their serials. How do faculty and students use these resources? They search them on the screen, and often skim them on the screen, and if they want to read them carefully they print them out and carry them around. Thus, I claim that the great success of e-journals can be attributed in no small part to the fact that their content comes in easy, print-sized chunks.

I’m betting that something similar will be true of e-books. They will really take off when their publishers admit that on-screen (in either computer or reader) is not the best medium for serious and sustained reading, and develop and use technical and rights environments that allow cheap and convenient print on demand. It’s wonderful to be able to search and to skim on screen, but when you want to read, there is nothing like a book or a printed article. The Kindle and the Reader are great; I wouldn’t leave home without one. But, like almost everyone, I do most of my reading at or near home.

Why I hate the phrase “Scholarly Communication”

November 23rd, 2007

I hate the phrase, “scholarly communication.”

It’s not that I hate the practice, which I view as a pinnacle of human achievement, without which the life and work of many (including me) would be meaningless. It’s that the phrase itself connotes a mechanical process, rather than the transcendent purpose that underlies the activity itself.

Decoding the term “scholarly communication” requires us to consider other adjectives as applied to communication. Not all that many are in common use, and most that are refer to technology or to style, e.g. “terse communication,” “telephonic communication” (an archaic usage that is not all that old), “written communication,” “verbal communication.” None of these is properly parallel with “scholarly communication,” where the modifier of “communication” signifies a type of work. If we look at other lines of work that involve communicating we find no good linguistic parallels. I, at least, have never heard of “journalistic communication,” “artistic communication,” “filmic (?) communication,” “photographic communication,” “dramatic communication,” or the like. Rather one speaks of journalism, art, film, photography, etc.

The obvious parallel to all of these noble lines of work is simply “scholarship,” and indeed, as I have argued elsewhere, scholarship without communication isn’t well defined, because an essential part of scholarship is making one’s work public by contributing one’s thoughts and knowledge to the scholarly literature. (Note, by the way, that we have many modifiers of the word “literature” that are fully parallel to this usage.) Moreover, an extremely important part of the practice of scholarship, namely private communication between and among scholars, is NOT included in the conventional meaning of the term “scholarly communication.” That’s because what we really mean to be talking about when we say “scholarly communication” is scholarly publication, by which I mean the set of mechanisms (and associated rules and practices) by which scholarship is made public. The mechanisms include traditional and less traditional methods of publication (inter alia , monographs, blogs, simple postings on websites) plus traditional and nontraditional methods of presentation, including lectures, YouTube clips and podcasts, plus the zillions of ways that these and other technologies of communication can be combined.

I think that what we are usually talking about when we use the term “scholarly communication” is the business of making scholarly things public, including, of course, the economic viability of academic journals and academic presses, as well as the copyright and other legal and regulatory regimes that affect the business of making scholarship public. (For continuing valuable treatment of these subjects, see Scholarly Communications@Duke, a fine blog in everything but name.)

So, we are looking for a phrase that means something like, “the physical and economic mechanisms used to make scholarship public.” As I have implicitly suggested above, there is a perfectly good phrase to describe this, right out of the dictionary. The OED defines “publishing” as “the act of making something publicly known,” which is exactly the notion that we are looking for. If we want to distinguish between scholarly publishing generally and the particular activities and vicissitudes of university presses, we could speak specifically of “academic press publishing,” and also of “academic publishing,” which would connote a part of the publishing business aimed at the academy, parallel to trade publishing, or mass-market publishing. As in the commercial cases, many media would be included in addition to print books and print journals.

In the end, my point is simple: We seek to understand and improve the mechanisms used to make scholarly work public, and we would like a word or phrase to describe the object of our study. The terms “publishing” and “publication” connote our interest precisely, whereas “communication” does not. Indeed, much communication has nothing to do with making things public. (Consider “confidential communication,” which is commonplace, in juxtaposition to “confidential publication,” which is simply bizarre.)

Of course, I have no realistic hope of changing the words that we use to discuss these important matters. Once a term of art gets entrenched in the academy, it is rarely dislodged, and “scholarly publishing” has come to mean “what academic presses do,” while “scholarly communication” has come to mean what I said at the top of the previous paragraph. So I expect that this amiable rant will have no effect, but the tradition of amiable ranting is well established in both scholarship and blogging, and one can always hope for miracles.

Teaching School

November 12th, 2007

Paul Duguid’s comment on an earlier post of mine gets to important issues that I expect to discuss repeatedly (although not repetitiously) in this space. Among the big questions that he raises are these two: (1) How good a job will Google Book Search do? (2) What are the consequences that flow from the answer to (1)?

I can’t answer the first question. Thus far GBS has not done well with multivolume works, sort of like iTunes with classical music. In both cases, metadata is thrown away, and the results are often more amusing than useful. Library partners, including Michigan, have been on Google’s case about this for some time. Duguid asks whether Google will learn from Michigan. My experience is that in general Google is very good at learning.

I have more to say about the second question. Here I am optimistic. Suppose Google never gets good at multivolume works, and in this and possibly other domains falls well short of good performance in delivering to users what they are looking for. I find it very unlikely that such a circumstance would be sustained, because Google has a strong interest in being responsive to its users. So the outcome will turn on how discerning the users will be, and on that subject colleges and universities and their libraries should have a great deal to say. What matters is whether academic libraries and their associated colleges and universities are able to teach their students well enough so that students can tell the difference between good search outcomes and misguided ones. (We also need to teach our students how to recognize sources with reliable provenance, and how to use such sources in order to make sense of their own and others’ arguments, but that is a longer discussion for another time.)

If we (academic institutions) do our job well, users will not tolerate unreliable search outcomes, and in that case I would expect Google to be responsive, not because libraries have told them how to catalog books, but because users will find books that are ill-cataloged to be less useful than books that are well-cataloged. By using the Google-scanned works well in our teaching and research, we can develop practices of scholarly literacy that use authenticated and reliable digital sources. GBS may be the direct source of the works, or we may rely on the library copies. Either way, the important job for academic institutions is to teach well (or, more precisely, to assure that their students to learn well) and that is exactly as it should be.

Quick response to Siva Vaidhyanathan

November 6th, 2007

[This is a reposting of a comment I made in response to Siva Vaidhyanathan’s questions about my previous post. I am traveling, and can only produce brief answers to his questions now. Later this week I’ll get to most of the issues in more detail here.]

Let me start by reminding everyone that I do not speak for Google, nor am I engaged in generalized cheerleading on Google’s behalf. Rather, I am arguing that the University of Michigan Library is doing a Good Thing in its digitization project with Google.

Below are Siva’s questions, and my responses:

He dismisses serious search problems as temporary, yet fails to confront the problem that Google cannot and will not explain the factors and standards that put one book above another in search results.

Actually, I don’t mention search at all in my post. Nor (see above) do I speak for Google.

As users discover poorly-scanned files on the Google index, how can they alert Google to the problem? Why does nothing in the contract between Michigan and Google include quality-control standards or methods?

Please see Michigan’s agreement with Google, clause 2.4, the relevant part of which reads: “U of M will engage in ongoing review (through sampling) of the resulting digital files, and shall inform Google of files that do not meet benchmarking guidelines or do not comply with the agreed-upon format. Should U of M encounter a persistent failure by Google to meet these guidelines or supply the agreed-upon format, U of M may stop new work until this failure can be rectified.” The agreement is online at: http://www.lib.umich.edu/mdp/umgooglecooperativeagreement.html

How do we know this index will last for decades? What image file system is Google using and what ensures its preservation?

I believe that in my post I said that the UM library (like other partner libraries) is also storing and preserving the files that Google scans. Maybe Google won’t last for decades, but the libraries will, and the libraries are pretty serious about preservation.

How is the “library copy,” that electronic file that Michigan and others receive as payment for allowing Google to exploit their treasures, NOT an audacious infringement of copyright? It violates both the copyright holder’s right to copy and right to distribute. Doesn’t a university library have an obligation to explain this?

It’s hard to get past the first premise of this set of questions. One literal answer would be to say that there is no such electronic file, because Google is not obtaining anything by means of exploitation.

I must say that I am troubled that the author of a very sensible book about copyright is so enthusiastic about trashing Google that he is willing to give up on the uses, notably scholarly uses, that are permitted in the higher-numbered sections of the Copyright Act. As my institution’s copyright lawyer says: “FAIR USE, it’s the law.” And my institution believes that when we have Google digitize our holdings we do so under the law and in order to make uses that are not only lawful, but that are completely consistent with the undergirding purpose of copyright law.

Siva is much younger than I am, so he may be willing to wait decades before finding out how scholarship and society can benefit from digitized and searchable collections from some of the world’s great libraries. For myself, I’d like to unleash my colleagues and our students on this remarkable resource while I’m still around to see what happens.

Finally, re Ryan Shaw’s post, yes, we receive the OCR.

On being in bed with Google

November 4th, 2007

One of the things that surprises me most about reactions to the Google Library Project is that smart people whom I respect seem to think that the only reason that a university library would be involved with Google is because, in some combination, its leadership is stupid, evil, or at best intellectually lazy. To the contrary, although I may be proved wrong, I believe that the University of Michigan (and the other partner libraries) and Google are changing the world for the better. Four years from now, all seven million volumes in the University of Michigan Libraries will have been digitized – the largest such library digitization project in history. Google Book Search and our own MBooks collection already provide full-text access to well over a hundred thousand public domain works, and make it possible to search for keywords and phrases within hundreds of thousands more in-copyright materials. This access is altering the way that we do research. At least as important, the project is itself an experiment in the provision and use of digitized print collections in large research libraries. I do not see how we can discover the best ways to use such collections without experiments at this scale. In sum, I believe that our library is doing exactly what it should do in the best interests of scholarship and our users, now and in the future.

So I’m puzzled when people ask, “How could serious libraries be doing this? How could they abdicate their responsibilities as custodians of the world’s knowledge by offering their collections up as a sacrifice on the altar of corporate power? Why don’t they join the virtuous ranks of the Open Content Alliance partners, who pay thousands of dollars to digitize books at a rate of tens of thousands of volumes a year?” It seems like those who ask such questions have little appreciation of what Michigan and the other Google partners are actually up to.

Google is on pace to scan over 7 million volumes from U-M libraries in six years at no cost to the University. As part of our arrangement with Google, they give us copies of all the digital files, and we can keep them forever. Our only financial outlay is for storage and the cost of providing library services to our users. Anyone who searches U-M’s library catalog, Mirlyn, can access the scanned files via our MBooks interface. That’s right, anyone. (Copyright law constrains what we can display in full text, and what we can offer only for searching, but we share as much as we can consistent with prudent interpretations of the law.) For an example of an MBook, take a look at The Acquisitive Society by R. H. Tawney.

In a recent New York Times article about mass digitization projects, Brewster Kahle was quoted as saying: “Scanning the great libraries is a wonderful idea, but if only one corporation controls access to this digital collection, we’ll have handed too much control to a private entity.”

I agree with him. I’m an economist with a particular interest in public goods, which is how I came to be involved with libraries in the first place. Libraries have a long and honorable history of preserving information and making it accessible. Moreover, even at their best, for-profit institutions cannot be expected to serve general public interests when those interests run counter to those of their shareholders. So I would be distressed if a single corporation controlled access to the collections of the great academic libraries, just as I find it troubling, on a smaller scale, that a handful of publishers control access to much of the current scientific literature.

But Google has no such control. After Google scans a book, they return the book to the library (like any other user), and they give us a copy of the digital file. Google is not the only entity controlling access to the collection – the University of Michigan and other partner libraries control access as well. Except we don’t think of it as controlling access so much as providing it.

Since 2005, Siva Vaidhyanathan has been making and refining the argument that libraries should be digitizing their collections independently, without corporate financing or participation, and that those who don’t are failing to uphold their responsibility to the public. “Libraries should not be relinquishing their core duties to private corporations for the sake of expediency.”

“Expediency” is a bit of a dirty word. Vaidhyanathan’s phrase suggests that good people don’t do things simply because they are “expedient.” But I view large-scale digitization as expeditious. We have a generation of students who will not find valuable scholarly works unless they can find them electronically. At the rate that OCA is digitizing things (and I say the more the merrier and the faster the better) that generation will be dandling great-grandchildren on its knees before these great collections can be found electronically. At Michigan, the entire collection of bound print will be searchable, by anyone in the world, about when children born today start kindergarten.

Google brings to us extraordinary technical and computing power and tremendous financial resources. The libraries bring an understanding of our collections and our users, and a profound commitment to public access. We are not relinquishing our duties in the name of expediency; we are working with a capable partner to create a far more useful resource than we could create on our own. (Would I prefer that a charitable foundation would support this work on the same schedule as Google, and make everything available to everyone, subject only to copyright restrictions? You bet. I would prefer it even more if that foundation would buy out all of the rights holders for all out of print works. Can someone tell me the name of the foundation, please? In the meantime, it seems to me that being in bed with Google is way better than sleeping alone.)

It’s true that the digitized files from Google’s scans are often far from perfect. Historian Robert Townsend, Paul Duguid, and others have raised technical questions about the quality of Google’s scans, and their appropriateness for preservation. Those are important questions, and there is a great deal of work to be done, both by Google and by the libraries, before we consistently achieve the level of quality and bibliographic reliability that are essential to successful scholarly practice. I will discuss some of the specific steps we are taking to address quality in a future post, but for now I will just say that the solution of these problems will require the serious engagement of academic libraries, and that the visibility of the problems is essential to their solution. Mass digitization on the scale of the Google library project was unimaginable five years ago, and it comes as no surprise to me that we are learning a lot as we go long. We are learning in the tradition of serious academic work, by putting our ideas and our resources in the public eye, where they can be seen, and criticized, and improved.


November 4th, 2007


My name is Paul Courant, and after over 30 years as a college professor and academic administrator, writing and teaching on economics and public policy and serving in a variety of roles including department chair and provost, I recently became University Librarian at the University of Michigan. I find that the pace of change and the volume and frequency of commentary in the world of academic libraries and scholarship call for quick (and sometimes loud) response. So I am starting a web log. While libraries and related matters will be the subjects of much of what I am likely to say here, they will not be the only topics. Those of you know me know that I have opinions about many things, and I have never been especially reticent about sharing them. So, I expect that my posts (and I hope yours) will from time to time cover a variety of matters whose immediate relationship to libraries and publishing is not obvious. (Although, of course, pretty much everything of importance is related to libraries, in that if you can’t find a trace of something in a library you probably can’t find it at all.)

My immediate motivation for starting a blog is to add a generally positive (although never uncritical) voice to the cacophony around the subject of large-scale digitization projects in academic libraries, and my first substantive post is on that subject. In starting this blog I am also responding to Peter Brantley’s recent comments to the effect that the voice of big libraries has been noticeably absent on the list-serves and blogs that are the loci of much of the public debate on digitization and other innovations in libraries and publishing.

Of course, any comments that I make here are my own, and are not those of the University of Michigan or its libraries. What I have to say is surely affected by the various roles that I play in my day jobs, but here I speak only for myself, and not for my employers, groups with which I am affiliated, family members, or anyone else.