The Great Digital Migration

The following was first published on an earlier version of this blog in the spring of 2010. It is republished here with light revisions.

I spent some time with the papers of a recent conference at Virginia: Online Humanities Scholarship: The Shape of Things to Come. Here are some comments on them, with quotations from the papers, which you can download from http://shapeofthings.org/papers/.

I am not trying to cover the whole conference but let my attention be guided by my current interest in the quality and interoperability of the digital surrogates of Early Modern English texts — the approximately half million bibliographical items catalogued in the English Short Title Catalog, of which about 30,000 exist now in full-text XML transcriptions and some 70,000 will exist in that format in 2015, when they will all pass into the public domain. What forms of collaboration could make these digital surrogate as good as they ought to be if they are to serve as the basis for future scholarly inquiry? How did The Shape of Things To Come help me think about that question?

By ‘quality’ I mean the readerly properties of a digital surrogate. If in the context of scholarly work you look at a digital version of Holinshed’s Chronicles, Hobbes’ Leviathan, or Milton’s Paradise Lost you want that version to be as accurate and readable as a standard print edition. By ‘interoperability’ I mean the algorithmic amenability of the digital surrogate, its capacity for being variously divided or manipulated, combined with other texts for the purposes of cross-corpus analyses, having data derivatives extracted from, or levels or metadata added to it. Greg Crane envisages a future in which “digital surrogates for human cultural heritage … flow freely and instantaneously back and forth between humans and machines.” In a Utopian moment at the end of his opening talk Jerry McGann envisages an “online World Library” and lists types of resources that would not fit because

“they meet traditional scholarly standards but are designed in digital formats –typically HTML — that are…unsustainable [and] cannot exploit the integrating functions that web technology such a powerful social network”
they are “internally well-designed but … by choice or circumstance .. do not participate in …second-order integration”
they “lack any online presence at all: university press backlists …or the current an/or back issues of many scholarly journals”
they are “materials being hurled on the Internet in corrupt forms by Google and other commercial agents: materials that are badly scanned, careless or merely randomly chosen, poorly if at all structured. “

This is as good a list of hurdles as the proofs that Lysander and Hermia advance for the claim that “the course of true love never did run smooth.” But like true love, full interoperability remains a worthy goal: by always keeping it in mind you may sometimes fall a little less short of it.

Should we despair or hope? Almost a decade ago, McGann observed that “In the next fifty years the entirety of our inherited archive of cultural works will have to be re-edited within a network of digital storage, access, and dissemination. This system, which is already under development, is transnational and transcultural.” (Cited from his article “Our Textual History in TLS, November 20, 2009). In his opening remarks to the conference (“Sustainability: The Elephant in the Room”) McGann does not dwell with satisfaction on progress made. Instead he analyzes the institutional and political obstacles that block progress towards a realization in the digital realm of a goal shared by all scholars: “We all want our cultural record to be comprehensive, stable, and accessible. And we all want to be able to augment that record with our own contributions.”

McGann sees digital technology as a decisive factor in disrupting a value chain of scholarly work in which scholars, publishers, librarians, and patrons, had long established and clearly understood roles. The Great Digital Migration that has been underway for two decades has been a “hotch-potch” and “the community of scholars has played only a minor role in shaping these events. We have been like marginal, third-world presences in these momentous changes – agents who have actually chosen an adjunct and subaltern position.” Elsewhere in this talk McGann speaks even more bluntly:

It’s a fact that most colleges and universities have not formulated comprehensive or policy-based approaches to online humanities scholarship. Resources for the use of media in the classroom, including electronic and web media, are fairly common. But a commitment of institutional resources to encourage digital scholarship is very rare. … But it’s clear that the universities are responding to facts on the ground: i.e., to the scholars themselves and their professional agents. Most scholars and virtually all scholarly organizations have stood aside to let others develop an online presence for our cultural heritage: libraries, museums, profit and non-profit commercial vendors. Funding agents like NEH, SSHRC, and Mellon have thrown support to individual scholars and small groups of scholars, and they have encouraged new institutional agents like Ithaka, Hastac, SCI, and Bamboo. But while these developments have increased during the past seventeen years – i.e., since the public emergence of the Internet – the scholarly community at large remains shockingly passive.

McGann thinks of this gloomy vision as a call to action under the sign of “not what but who” and ends his talk with the question both to the participants at this conference and to humanities scholars at large “What are you prepared to do?”

The most promising answers point to scholarly communities taking charge of the texts about which they care. Paolo D’Iorio, the general editor of Nietzsche Source, looks back to the “tradition of the academic societies of the XVII century” as the model for “open scholarly communities on the Web” but argues that these “do not yet exist and … will be difficult to create. He offers a subtle “Scholarly Ontology” about the ways in which scholarly communities and text corpora mutually constitute each other:

If a scholarly community intends to conduct research on a certain topic, it first needs to define which documents or objects to consider as its primary sources. When a research line is about to be developed and consolidated, a catalogue of primary sources is compiled, usually by archivists or librarians. The catalogue of primary sources lists the relevant classes of objects and often includes the complete list of their instances. … Catalogues of secondary sources come later, and are written by scholars or librarians… The distinction between primary and secondary has a fundamental epistemic value. According to Karl Popper, what distinguishes science from other human conversation is the capacity to indicate the conditions of its own falsification. In scholarship, the conditions of falsification normally include the verification of hypotheses on the basis of a collection of documents recognized by a scholarly community as relevant primary sources. Thus we can affirm that the distinction between primary and secondary sources exhibits the conditions for falsifying a theory in the humanities.

With textual documents, however, the distinction between ‘primary’ and ‘secondary’ varies with the scholar’s inquiry: “an article written by Nietzsche on Plato is a primary source to Nietzsche scholars, but it is a secondary source to Plato scholars.” Primary texts — or more accurately, texts treated as primary — are objects of special care. We “cherish and preserve” them, as Penelope Kaiserlian, the director of the Rotunda Press, says of the 60,000 distinct documents in the “cross-searchable collection of American Founding Era documentary editions.” If you are not given to a rhetoric of pietas you would still care to get those texts right, because they provide your only ground for verification.

Does it matter how good or bad the digital texts are if somewhere in the library there is a printed copy of a critical edition with all the variants in a state-of-the-art apparatus criticus? The answer is ‘yes’. McGann may well be right that the ultimate backup of the “permanent core” in the “scholarly materials” of the Rossetti Archive will be a print-out that will “fill two dozen or more large volumes.” But the sheer convenience of the Web means that texts will be increasingly cited and quoted from digital sources, and the quality of those sources will determine the quality of texts in use. Whose job is it to get them right?

Roger Bagnall’s description of Integrating Digital Papyrology (IDP) is a very instructive account of a digital scholarly community rallying around its data:

Of course, the world of the Web has changed dramatically since 1992, and the possibilities today are much richer than they were then. I would say that these changes have affected the vision and goals of IDP in two principal ways. One is toward openness; the other is toward dynamism. These are linked. We no longer see IDP as representing at any given moment a synthesis of fixed data sources directed by a central management; rather, we see it as a constantly changing set of fully open data sources governed by the scholarly community and maintained by all active scholars who care to participate. One might go so far as to say that we see this nexus of papyrological resources as ceasing to be “projects” and turning instead into a community.

Can one generalize from the experiences of so ‘nichy’ a sub-specialty as Greek papyrology? By virtue of the fragile and fragmentary nature of the sources that constitute their discipline, papyrologists are rarely more than two steps away from the material base of their data. In this regard they are quite untypical of humanities scholars, especially of scholars in English departments. If we are worried about the apathy of humanists when it comes to the transcription of a cultural heritage, what can we learn from the papyrologists? Would we not be better off looking at the crowdsourcing of historical Australian newspapers — a topic about which Rose Holley has written a splendid report with the attractive title Many Hands Make Light Work?

No doubt, collaborative curation of digital surrogates of printed texts will more often be like working on newspapers than working on papyri. On the other hand, Bagnall’s discussion highlights with exemplary clarity the problems of ownership, recognition, and quality control that are central to scholarly digital projects. If I think of colleagues with a philological or editorial conscience, they pretty much operate on the principle that a text cannot be trusted if it is not printed. In practice, there is much to be said for this view. In theory it is wrong, and Bagnall tells you why. You learn from him about

the Berichtigungsliste, a remarkable research tool in papyrology that collects periodically—there have been twelve volumes since its inception in 1915—all corrections proposed to the texts of papyrus documents (the universe of DDbDP and HGV), new datings or provenances suggested, and a fair amount of bibliography about the documents. It has for two generations been a joint project of Leuven and Marburg, now Leuven and Heidelberg. Before corrections are registered now, the editors of the BL do their best to check them to see if they think they are correct; if not, they are reported but with disapproval attached. How, my friend asked, will we prevent people from just putting in fanciful or idiotic proposals, thus lowering the quality of this work?

Bagnall argues persuasively that you can do as well or better in a digital and collaborative environment:

These systems are not weaker on quality control, but stronger, inasmuch as they leverage both traditional peer review and newer community-based ‘crowd-sourcing’ models. The worries, though, are the same ones that we have heard about many other Internet resources (and, if you think about it, print resources too). There’s a lot of garbage out there. There is indeed, and I am very much in favor of having quality-control measures built into web resources of the kind I am describing.

But for Bagnall, the concern with quality control marks often masks a concern for control:

“Ist mein, ist mein!” People who have created or curated projects are possessive. This possessiveness has its good side; it leads to personal investment. But in the end we possess nothing, because we are mortal; and our institutions, even if undying, do not tend to steer straight courses with unvarying purposes and priorities. They abandon our beloved projects when something new comes along. We could all cite examples. Control is the enemy of sustainability; it reduces other people’s incentive to invest in something. The same thing could be said of our books; it’s just easier to rework and reuse digital content.

In his response to Roger Bagnall, the ever practical Greg Crane develops a line of argument that picks up McGann’s demonstration of the mismatch between the priorities of most humanities scholars and the attention they ought to pay to the Great Digital Migration. Looking at some 700 reviews in the 2009 Bryn Mawr Classical Review and at about 100 cv’s, he found 28 reviews of commentaries or editions and three candidates with an interest in those genres of scholarship:

In effect, classicists as a group have made a cost/benefit decision to allocate less than c. 5% of their labor to the production of editions and commentaries. Improving the print infrastructure for the 50 million words of Greek and Latin that survive in manuscript transmission through c. 500CE was not a high priority – the benefits were no longer great enough to justify much scholarly labor. We invested our energy rather in interpretive articles and monographs.

This is probably an accurate statement about English departments as well. It is possible for individuals within fields of activity to make choices that make professional and economic sense within the field but lead the field as a whole astray. The steel and automobile industry come to mind. Would it be a good thing if the balance of what used to be called Lower and Higher Criticism shifted from 5:95 to 15:85 or even 20:80?

Crane glances at the sciences while describing useful scholarly contributions that undergraduates can make even early in their career. Here too the example of Classics may transfer readily to other humanities disciplines. While “the intellectual culture of Classical Studies assumes a long apprenticeship model”:

In a culture of digital editing, our students can begin contributing in tangible ways as soon as they can read Greek – first year Greek students are already able to distinguish text from commentary in the digitized Venetus A manuscript of Homer. Intermediate students of Greek and Latin offer their own analyses of individual sentences for the Greek and Latin Treebanks – contributions that are then compared against each other and then added to public database, with the names of each contributing student attached to each sentence. These contributions can develop seamlessly into undergraduate and MA theses of real value and immediate use. When our students publish unpublished material or contribute to knowledge bases, we find ourselves in a participatory culture of active learning. Pale clichés about citizenship and democratization suddenly become tangible.

There is nothing innovative in having undergraduates contribute to and then conduct research within a field – promising students in the sciences, for example, regularly begin working in laboratories, taking measurements or conducting technical procedures, and then develop experiments of their own. Classics is – or should be – a demanding field but no more so than the sciences.

Scholarly communities taking care of their data in the Great Digital Migration — how does this scenario play itself out in the institutional settings that add up to Scholarly Communications? McGann’s emphasis on the priority of institutional and political decisions has led him to devote much energy to NINES as a framework for scholarly neighborhoods in the nineteenth century. Laura Mandell and others are leading 18Connect as an extension into the previous century. But if these enterprises fully take off and become the 21st century children of the learned societies of the seventeenth century, one would want them to be much more closely allied and perhaps ultimately merge with their older and non-digital siblings. “Digital Humanities” is to some extent the failure of a ‘not yet’: there is no digital Economics, Chemistry, or Biology, although there is the subdiscipline of Bioinformatics. With regard to information technology, these disciplines are much more mature and have simply absorbed the challenges and affordances of new technologies into the everyday lives of their practitioners.

Finally, there is the Library. McGann observes accurately: “When digital scholarship in the humanities thrives at a university these days, the library is almost always a key player, and often the center and driving force.” I cannot think of a counterexample. The conference papers include exceptionally interesting reflection on “perpetual stewardship,” a natural term in a conference that had “sustainability” as one of its major themes. Penelope Kaiserlian talked about confronting the “implications of perpetual stewardship as we look to Rotunda’s future.”

Paul Courant, Michigan’s University Librarian, observed that digital technology has reversed the traditional roles of library and publisher. In the old days, librarians were the keepers of the books they bought. In the new world of digital publications, libraries are subscribers to data kept by publishers, and both libraries and publishers find themselves in unfamiliar roles.

Courant points to a paradox with regard to the ‘keeping’ of digital and printed materials. A book on the shelves of a libraries takes up 0.15 cubic feet forever and year after year incurs the costs of taking up that space and the services associated with it. Libraries are familiar with these costs and at some level they don’t require any special activity. Digital materials require more active and less familiar forms of upkeep. These activities may cost less but they are not yet well integrated into the budgets.

Courant argues that libraries make for better perpetual stewards:

The separation of stewardship from direct provision of access adds an unnecessary and complicating layer to the ecosystem of scholarly publishing. As an alternative, the Press could be a quasi-independent element of an academic library, responsible for its own editorial functions, but relying on the library for provision of the perpetual stewardship that electronic publication requires, and using the library “brand” to advertise its ability to provide such stewardship. Such an arrangement would be efficient in the simplest sense: neither the library nor the publisher would be required to learn how to do things that are not already part of its natural compass of activity.

From this reflection on perpetual stewardship you might look again at Bagnall’s argument about collaborative data curation

We no longer see IDP as representing at any given moment a synthesis of fixed data sources directed by a central management; rather, we see it as a constantly changing set of fully open data sources governed by the scholarly community and maintained by all active scholars who care to participate.

IDP here can be a placeholder for any project that involves collaborative data curation by a scholarly community. Where do you house or provide the technical infrastructure for such collaboration? Should it be seen as part of the library’s perpetual stewardship? That is not only an attractive idea, but it is difficult to see what other institutions in the humanities could play that role. But it will take much talking, thinking, and planning to get there.

One thought on “The Great Digital Migration”

Leave a Reply Cancel reply