Scalable Reading

dedicated to DATA: digitally assisted text analysis

...the broad circumference
Hung on his shoulders like the Moon, whose Orb
Through Optic Glass the Tuscan Artist views
At Ev’ning from the top of Fesole,
Or in Valdarno, to descry new Lands,
Rivers or Mountains in her spotty Globe.
(Paradise Lost, 1. 286-91)

undergraduate work

New release of Shakespeare His Contemporaries

I have put a new version of Shakespeare His Contemporaries on Google Drive, where you may or view or download the plays. In this version I have grouped the plays by decades and put them in directories with names like 155, 156 …165. The plays have been encoded in TEI  Simple. The texts are in...

Hannah, Kate, and Lydia at work

While reviewing the work of Hannah, Kate, and Lydia, I enjoyed the precision and concision of their annotations. A sample of them appears below. While a full documentation would require snippets of the image and the transcription as well as the annotation, the annotations themselves clearly show their minds at work, combining clear description with...

Thou com’st in such a questionable shape: Data Janitoring the SHC corpus from the perspectives of Hannah, Kate, and Lydia

  Below are the reflections of Hannah Bredar, Kate Needham, and Lydia Zoells about their adventures in the mundane world of Lower Criticism,  about which I wrote in an earlier blog and of which the digital surrogates of our cultural heritage will need a lot in the decades to come.  Racine observes in his preface...

Engineering English: Machine-assisted curation of TCP texts

The are somewhere in the neighbourhood of five million incompletely transcribed words in the rougly two billion words of English books before 1700 transcribed by the Text Creation Partnership. Depending on how you look at it, that is either a  lot or not very much at all. Less than half a percent of words are...

Hannah, Kate, Lydia, and Shakespeare His Contemporaries (SHC)

In an earlier blog entry I reported about the ways in which undergraduates at Northwestern and Washington University in St. Louis have contributed to the collaborative curation of TCP transcriptions of Early Modern plays. Their work was released on github as the SHC corpus, short for Shakespeare His Contemporaries. Hannah Bredar just graduated from Northwestern...

Shakespeare His Contemporaries: a half-time report

Hannah Bredar, Madeline Burg, Melina Yeh, and Nayoon Ahn have been at work for four weeks in their clean-up operation of the Early Modern plays in the TCP archive. Nicole Sheriko helped them in the first week and has since then focused on preparing a Young Scholar Edition of Fair Em. The clean-up operation proceeds...

Collaborative curation of Early Modern plays by undergraduates

  The following is an abridged and lightly edited version of a blog entry that I first posted in March 2010 on my now defunct Literary Informatics blog. Here is a small but potentially promising experiment with a group of undergraduates in a Shakespeare class that I taught in the winter of 2010. Its s...

“Fluent in Marlowe”: Emily’s and Sasha’s successful adventures in data curation

 In 2009  Emily Anderson and Sasha Puchalla, two undergraduates  in a course on Early Modern drama I taught then collaborated on acourse assignment to to check and correct the TCP EEBO transcription of Marlowe’s Tamburlaine. They worked from a spreadsheet with a ‘verticalized’ representation of the text in which every word was a data row...

Getting undergraduates and amateurs into the business of re-editing our cultural heritage for a digital world

The Chicago section of today’s New York Times has an article with the title “Volunteers at Planetarium Excel where machines lag.” The gist of the article is in these paragraphs: The Adler has become a leader in “citizen science,” a growing trend in astronomy research. As the lead institution of the Citizen Science Alliance, which...