The following reprints an earlier post of an entry that I first published on January 7, 2011 on my now defunct “Literary Informatics” site.

 

The Chicago section of today’s New York Times has an article with the title “Volunteers at Planetarium Excel where machines lag.” The gist of the article is in these paragraphs:

The Adler has become a leader in “citizen science,” a growing trend in astronomy research. As the lead institution of the Citizen Science Alliance, which includes Oxford and Johns Hopkins Universities, it has registered more than 350,000 non-experts to help classify the many thousands of pictures of galaxies taken by powerful telescopes.

The images can help researchers better identify the shapes of galaxies, observe the formation of stars and follow the movement of asteroids. Astronomers often use computers to help analyze photos of outer space, but computers can miss anomalies and patterns that the human eye is particularly equipped to catch, said Joshua Frieman, a researcher at FermiLab and a professor at the University of Chicago.

“You don’t need a lot of detailed astronomical training to be able to look at these images and answer certain basic questions about them,” Professor Frieman said.

For example, astronomers had assumed red galaxies were elliptical, but volunteers recently identified many of them as spirals. The amateurs’ observations have also helped scientists predict when solar magnetic storms, which often interfere with telecommunication satellites, will hit Earth.

Volunteers anywhere in the world can register online to view photos on their Web browsers, and send their observations to researchers at the planetarium

This story has very powerful implications for thinking about similar forms of “crowdsourcing” in the humanities. The following is a riff on ideas advanced by Jerome McGann and Gregory Crane, two scholars who for the past two decades have been in the vanguard of  Digital Humanities.

In 2001 McGann argued:

In the next fifty years the entirety of our inherited archive of cultural works will have to be re-edited within a network of digital storage, access, and dissemination. This system, which is already under development, is transnational and transcultural.

I quote from his repetition of the statement in the TLS essay “Our textual history” (November 20, 2009). At the recent “Shape of Things to Come” conference at Virginia Crane made very similar arguments, to which he added the point that in the field of Classics undergraduates with quite modest backgrounds can in the aggregate make very substantial contributions to scholarship. There is an enormous amount of work to be done. Much of it cannot be done by machines because it involves tasks that people are good at and machines are bad at. All or most of it can be done by humans more efficiently and accurately if it is supported by digitally based workflow and editorial frameworks.

McGann and Crane have drawn attention to the striking indifference of humanities disciplines to the challenges in the “digital migration” or latest stage in the allographic journey that is so characteristic a feature of the life of texts. Drawing on the distinction — familiar to philology — between “Lower” and “Higher” criticism, McGann has observed that since the middle of the twentieth century the literary academy has moved away from “lower” critical activities. Crane did some back-of-the-envelope calculations based on job letters and books reviewed in the Bryn Mawr Classical Review. He concluded that the Classics profession invests less than 5% of its human capital in editorial or curatorial work. From what I know about English departments, at least the departments on the high end of the food chain, this holds true of them as well.

Such a low rate of investment is perfectly reasonable if you assume that the century of intensive editorial labor that followed the 19th century professionalization of philology has created a print-based documentary infrastructure that will be good enough for the indefinite future. It also makes sense if you assume that the task of digitizing this documentary infrastructure is fundamentally a clerical task that can be safely delegated to technical staff in libraries or computing centres.

Both of these assumptions are widely shared in English or Classics departments, and at least two generations of literary critics of all stripes have lived off the capital of a century of editorial labor while paying little attention to the progressive migration of textual data from books on shelves to files on servers or in ‘clouds’. Both assumptions are almost certainly false, but challenging them is about as promising a task as persuading Iowa farmers that subsidies for ethanol, or for that matter any use of corn, are not in the national or even their own long-term interest.

McGann has been very eloquent about the consequences of the complacent neglect of ‘lower’ criticism during the formative years of an increasingly digital world. Aussies and New Zealanders like to challenge the rest of us by showing upside down maps with Europe and America “down under.” Or one might think of the computational world where ‘higher’ procedures — scripting languages like Python or graphical user interfaces — are made possible only by the radical advances in the power and speed of very ‘low’ procedures at the level of managing the flow of bits. The ‘low’ and the ‘high’ are paradoxically interwoven, a good lesson from the New Testament even for non-believers.

It would almost certainly be a good thing if departments of Classics, English, and similar disciplines doubled or tripled their investment in the curatorial tasks on which all scholarship is based in the long run, moving it from 5% to 10% or 15%, which by the standards of an earlier age would still be a relatively small disciplinary commitment to ‘lower criticism’. I use ‘curatorial’ here in a very broad sense that includes not only traditional editorial tasks but also various forms of data enrichment through which digital objects release their full query potential.

But as Crane has argued forcefully, more is at stake than moving the degree of curatorial involvement from the minimal to the still quite small. It is even more important to explore the affordances of digital technology when it comes to assigning curatorial tasks, dividing their labor, and achieving higher levels of efficiency and accuracy. As he put it in a topic sentence: “Digital editing lowers barriers to entry and requires a more democratized and participatory intellectual culture.” In the context of the much more specialized community of Greek papyrologists, Joshua Sosin has successfully called for “increased vesting of data-control in the user community.”

For Crane, the systematic involvement of undergraduates in data curation is an especially promising arena for developing a “more democratized and participatory intellectual culture.” He points to the useful contributions students in Intermediate Greek have made to a “tree bank” or syntactically annotated corpus of Ancient Greek. He also points out that work of this kind is largely an application to the humanities of work and acculturation practices that have been common in the sciences for generations.

That makes a lot of sense to me because my daughter is an evolutionary biologist. Her choice of a career was non-trivially affected by a summer she spent as a research assistant in a Berkeley lab, where she measured the energy consumption of lizards on a treadmill and was responsible for the care and feeding of the animals, including cleaning the cages.

Last summer, I worked with five Northwestern students who had just completed their freshman year. They spent eight weeks checking incompletely transcribed words in 280 Early Modern plays from the EEBO-TCP collection. The technical framework was sub-optimal. In particular they had to rely on very primitive alignments of the facsimile pages with the digital transcriptions, which made for slow going. But they picked up the requisite skills very quickly and corrected upward of 25,000 errors. The results of their work have been incorporated into the Wordhoard edition of Early Modern Drama.

The students in this project had summer research stipends. I can imagine colleagues who wonder whether this kind of activity should count as research. Cleaning out lizard cages or fixing transcription errors may be useful work. But is it research??

There are several answers to this question, and much depends on how students take to or are encouraged to take to such work. If they just do it (however well) and don’t think about it, the answer is ‘no’. The answer is ‘yes’ if you think of it as a way of entering a research environment and discovering that

  1.  In any discipline there are always a lot of humble tasks which must be done meticulously and consistently if the project is to succeed.
  2.  Some very fundamental aspects of a discipline are learned best in the trenches of those humble tasks.

For intelligent apprentices, the most important lessons may well be learned observing what is going on around them while being asked to sweep the floor. The students in my project did very useful work and learned a great deal about the orthographic and typographic variance of early modern print, the micro-fabric of Renaissance drama at the level of word or phrase, not to speak of the frailty and vagaries of textual transmission. They became “fluent in Marlowe,” to use a charming phrase coined by two students who did similar work as part of a course assignment. It was helpful and enjoyable for them to work in a lab-like environment. If any of them pursue a more independent research project in a subsequent year, it is very likely to be shaped directly or indirectly by their earlier experience.

My colleague Harlan Wallach reports on different but quite analogous experiences with undergraduates who helped with the creation of complex digital surrogates of murals in the Dunhuang caves or bas-relief statuettes of the Shui Lu’an temple near Xi’an. They were involved in the stitching together (and other forms of post-processing) of the meticulous digital photographs that served as the basis for the virtual reconstructions. Within a few years of graduation several of these students had considerable success in the media world, and at least some of it is due to the very useful, and not always exciting, curatorial work they did as apprentices.

I would bet at least half a paycheck that many colleagues at other institutions can tell similar stories in support of the proposition that projects of this kind are a superior form of initiating undergraduates into the world of research and giving them experiences that pay off handsomely in their later work, whether in professional or research environments.

Several practical conclusions follow from these reflections. If McGann is right in his argument that the “entirety of our inherited archive of cultural works” will need to be re-edited in the light a digital environment and its affordances and if it is the case that this work cannot be done well without much more direct scholarly involvement, there are great opportunities for leveraging the required scholarly capital by drawing on the enormous human capital that is to be found among undergraduates in just about any good college or university. Using that capital in ways that will maximize both the quality of the work and the scholarly/professional benefit to the undergraduates will involve the construction of much more robust and sophisticated curatorial platforms than currently exist. Building them is neither easy nor cheap. On the other hand, once built, such platforms can be shared within and across institutional projects. Their construction can itself become a collaborative project in the spirit of the NIH Shared Instrumentation Grant, which aims at making “available to institutions expensive research instruments that can only be justified in a shared-use basis.”

Building such platforms is not in principle very different from other large co-operative enterprises in the humanities, such as the Bamboo Project with its current plans for different types of “workbenches.” In fact, it may well be the best approach to build these scholarly platforms with a view to creating opportunities for undergraduate participation from the ground up. The marginal cost of making collaborative scholarly platforms “undergraduate-friendly” is probably quite low. As often, the major obstacles are likely to be social rather than technical or financial. But if these obstacles can be overcome, there is the vision of lab-like environments that offer to talented undergraduates an opportunity to participate in the digital rebuilding of a cultural heritage and learning a great deal while doing so.

Some undergraduates with scholarly interests end up in the scholarly world. Many more pursue professional careers but retain scholarly interests. Collaborative curatorial frameworks that can be used by undergraduates can be modified to reach out into the world of amateur scholars. Imagine a case in which over the course of the next three years half a dozen universities cooperate in building curatorial platforms with a goal of engaging humanities undergraduates systematically in the digital curation of some aspect of their cultural heritage. Now imagine a second stage of that project in which the platforms are modified for “outreach” and the partners seek to engage interested alumnni and alumnae in the task of cultural “remediation.” There are a lot of highly educated baby-boomers out there. At least some of them will be able to afford retirement and the pursuit of intellectual interests that they had long put aside.