In an earlier blog entry I reported about the ways in which undergraduates at Northwestern and Washington University in St. Louis have contributed to the collaborative curation of TCP transcriptions of Early Modern plays. Their work was released on github as the SHC corpus, short for Shakespeare His Contemporaries.

Hannah Bredar just graduated from Northwestern with honors in English. Kate Needham and Lydia Zoells are rising seniors at Washington University and look forward to their honors theses. Between April and July of this year Hannah, Kate, and Lydia, separately or together, visited the Bodleian, Folger, Houghton, and Newberry Libraries as well as the special collections of Northwestern and the University of Chicago. They fixed about 12,000 incompletely or incorrectly transcribed words.  Add them to the ~46,000 emendations done by Northwestern undergraduates in 2013, and you can say with some confidence that a rough clean-up of the TCP transcriptions of most plays by Shakespeare’s contemporaries has now been completed.  Much work remains to be done, but we are getting closer to the goal of representing the work of Shakespeare and his contemporaries in a corpus that is both human readable and algorithmically amenable. Most of Hannah’s, Kate’s, and Lydia’s corrections have been entered into the github repository.    I look forward to adding the remaining corrections and several other improvements to the corpus by mid-August.

The Early Modern Drama community owes Hannah, Kate, and Lydia a vote of thanks for their excellent work. Some years ago I read an interview with a German executive who was asked what he did  when he went to a new company. He said he always looked first for the “Durchzieher” or “through-pullers.” If he met the three of them he would see right away that each of them is a “through puller” of a very high order. I hope they will set an example for others.

The Library Finder that Kate and Lydia designed has one particularly useful feature. You can look at libraries that are within the footpaths of undergraduates and look for their holdings of Early Modern plays and the remaining textual defects in them. These are lacunae marked as such in the TCP texts. “Known unknowns” in Donald Rumsfeld’s parlance, waiting for enterprising undergraduates as citizen scholars.  They would hardly need to go out of their way to do an hour’s useful work. Here is a little table with lacunae to be filled in descending order:

Texas-Austin 5792
Harvard 5720
Yale 3203
Illinois, Urbana 1862
U. Penn 1024
Columbia 796
U. Chicago 506
Smith 381
Northwestern 281
Williams 143

 

The Library Finder also tells you how many gaps there are in each play. Undergraduates at Smith could quickly see that Massinger’s Unnatural Combat contains 58 lacunae that have not yet been filled.  If they were to use the AnnoLex curation tool it would show them where the errors are located. And it offers a simple template for entering corrections.