The libraries of the Big Ten Academic Alliance (BTAA) are looking forward to an “interdependent networked future”  and to managing their separate collections “as if they were a single, shared one“.  Here are some ideas about how this might work in Early Modern Studies, a field to whose documentary infrastructure those libraries have  made a particularly significant contribution. Two decades ago—when the BTAA was kown as the CIC—Mark Sandler at the University of Michigan took the lead in negotiations with Proquest that led to the formation of the Text Creation Partnership (TCP) and ensured that some 60,000 XML transcriptions of Early Modern printed books would pass into the public domain within five years of the completion of each of the two stages of that project. By January 2021 anybody anywhere can get at what is for many practical purposes a deduplicated library of English books from Caxton’s Recuyell of the historyes of Troye (1473)—the first English imprint—to the early 1700’s.  The TCP project was led by Michigan and Oxford (with the former doing rather more of the work), but CIC libraries were the core subscribers, especially in the early years. Without the CIC the TCP might well not have happened. As it stands, the TCP is an extraordinary achievement. Despite its well-known shortcomings there is no other text archive that offers as complete and consistent coverage for a period of comparable duration and significance.

A few years ago, Andrew Keener, then a doctoral student at Northwestern and now an assistant professor at Santa Clara University, led an initiative that repurposed the acronym RBML into “Renaissance Books in Midwestern Libraries.” Over the course of that initiative much useful work was done discovering holdings here and there and reporting them to the English Short Title Catalog (ESTC). What about a more ambitious version of that project, which would aim at creating “digital combos” by matching the TCP transcriptions with high-quality and public domain digital facsimiles? The riches of Rare Books between the coasts are not always appreciated, but UIUC, the Newberry,  UChicago, and Northwestern hold original copies of about a quarter of all TCP items and more than 40% of all pages. Other libraries excel in particular niches: Ohio State has an unmatched collection of the many editions of Foxe’s Book of Martyrs.

The TCP texts were transcribed from EEBO images—digital scans of microfilms made over the course of the 20th century.  The TCP site lets you look at both transcription and image. There are, however,  two problems. First, many of the images are quite bad, and defective images are the major source of incompletely or incorrectly transcribed words or passages. Second, and more importantly, the images are behind a paywall that many institutions find too high to climb or jump over. The #FrEEBO twitter storm of a few years illustrated the problem. “Together, we can FrEEBO” by Harvard’s John Overholt was an eloquent plea.

In a conference on “research data life cycle management”—an informative mouthful—Brian Athey, chair of computational medicine at Michigan, observed that “agile data integration is an engine that drives discovery.” A digital combo is a good example: without a transcription you cannot easily search across a corpus, but without an image you cannot check the reliability of the transcription. Moreover, a facsimile gives you a sense of the “look and feel” of a text in its original.  From an analytical, evidentiary, or aesthetic perspective there is much added value in such a combo.

In EarlyPrint—a collaborative project involving students, IT staff, librarians, and faculty at Northwestern, Notre Dame, and Washington U.in St. Louis—we created 385 digital combos. Most of them came via the Internet Archive from the Boston Public Library, the Princeton Theological Seminary, and several other places. New images of  plays by Shirley and the 1647 Beaumont and Fletcher folio were made at Northwestern. Notre Dame contributed two dozen texts from its rich 17th holdings in Anglo-Irish literature, including several books by William Petty, a founding father of statistics and political enemy and clearly the target of Swift’s Modest Proposal. In the course of creating these combos, we designed several semi-automatic routines for reducing the non-trivial time cost of stitching text and image pages together.

385 is a very small percentage of 60,000. But 269 are playbooks. They add up to well over half of the plays written by Shakespeare’s contemporaries as well as the playwrights of the previous and next generation. The example of this important textual neighbourhood shows that other neighbourhoods can be created in a relatively short time. For example, more than half of the ~550 facsimiles from the Princeton Theological Seminary are likely to map to TCP transcriptions. Stitching them together would add up to a very adequate theological library for some scholarly and many pedagogical purposes.

Somewhere in the Internet Archive , in Hathi Trust, and in various libraries,  there are between 3,000 and 5,000 digital facsimiles where you only need to stitch together text and image to get a useful digital combo. If all this were done, it would fill more spaces on an emerging map, but that map would still be dominated by blank space. If, say, UIUC were to digitize its magnificent Early Modern holdings, it would make a big difference.  But even for a large institution like UIUC that would be a very significant challenge, and it would not be easy to justify it in terms of the benefits that would accrue to the students and faculty there. The cost/benefit calculus changes if you think of Early Modern Studies across the 15 institutions that make up the Big Ten Academic Alliance (don’t ask me about that name), and it changes again once you recognize that the benefits for BTAA members would in fact accrue to anybody anywhere with an Internet connection.

I suspect that digitization of old (and therefore rare) books is still driven to some extent by libraries wanting to display “their” treasures. There is nothing wrong with that motive,  but it does not add up or scale well. Sometimes a particular copy has a history that justifies its digitization whatever other facsimiles may already exist. For instance, Penn’s  facsimile of Mary Wroth’s Urania was made from a copy that was owned by the author and has her own marginalia. A Shakespeare Folio with Milton’s marginalia is another example. But in most cases the creation of  the first facsimile of some text should take precedence over creating a second facsimile of some text. From the end user’s perspective it also does not matter usually  whether a digital facsimile comes from the Newberry, the UIUC, the Huntington, the Folger Library, or wherever, as long as the facsimile is good enough and the Internet connection is both stable and reasonably fast.

If the British government decided after a successful Brexit that the revival of an ever greater Britain should be celebrated by digitizing the complete  Early Modern holdings of the British Library and the Bodleian, many documentary needs of American Early Modernists would be adequately met. This is an unlikely scenario. A somewhat less unlikely scenario would involve the major American Rare Book libraries working together on a plan for distributed digitization. I am not holding my breath, however. It is not easy to go from “I” to “we”,  and I doubt whether they have the shared social infrastructure that would make it easier.  Such an infrastructure does, however,  exist among the BTAA libraries. It has a solid history, of which the creation of the TCP archive is a particularly distinguished chapter.  Why not add another and perhaps even more distinguished chapter that pairs the pages of TCP texts with high-quality images? From the perspective of the combined resources of the BTAA libraries that is a substantial but not gigantic undertaking. If the BTAA libraries put enough of their equity in such a project by revising internal priorities the odds of attracting additional funding from elsewhere would be very high.  The benefits for Early Modernists all over the world would be substantial.

A reader of this blog post might argue that I am contradicting myself when I argue a) that the BTAA should collectively digitize their Early Modern holdings and b) that in most cases it does not matter whether a digital surrogate comes from here or there as long as it is good enough.  That is true, but everybody’s business has a way of ending up as nobody’s business. One has to start somewhere.  The BTAA libraries clearly have the resources and the social capital to make substantial and quite rapid progress. If they do it, others might well want to join.