Scalable Reading

dedicated to DATA: digitally assisted text analysis

...the broad circumference
Hung on his shoulders like the Moon, whose Orb
Through Optic Glass the Tuscan Artist views
At Ev’ning from the top of Fesole,
Or in Valdarno, to descry new Lands,
Rivers or Mountains in her spotty Globe.
(Paradise Lost, 1. 286-91)

Latest entries

Best Buy and Curation en passant

I went to Best Buy to reduce the clutter of remote controls in my living room and simplify my life. Logitech’s Harmony may be the answer. Cheap it isn’t, but then ‘cheap’ and ‘simple’ are hardly synonyms–witness the very simple and very expensive white KPM china of the Königliche Porzellan-Manufaktur Berlin. I paused at the...

From Shakespeare His Contemporaries to the Book of English

 Introduction and Summary This is a report about “Shakespeare His Contemporaries”  of SHC, my project for creating an interoperable digital corpus of plays that in addition to Shakespeare’s include most of the plays written within a generation before and after his active career as a playwright. Its keywords are “query potential”, “digital surrogate”, “algorithmic amenability”, and...

Repeated n-grams in Shakespeare His Contemporaries (SHC)

This is a blog post about the distribution of  a special kind of “dislegomena,” tetragrams and longer n-grams whose “collection frequency” is 2 and whose “document frequency” is also 2. My purpose is to figure out how many swallows make a summer.  If you are interested in the intertextual relationship between one play and another,...

BlackLab: searching a TCP corpus by linguistic and structural criteria

Not quite two years ago I wrote an open letter about the TEI in which I wondered about its successes and failures.  I wrote about  “a thought experiment where you ask the chairs of history, literature, linguistics, philosophy, and religion departments of the world’s 100 top universities to write a sentence or short paragraph about the...

Shakespeare His Contemporaries: a half-time report

Hannah Bredar, Madeline Burg, Melina Yeh, and Nayoon Ahn have been at work for four weeks in their clean-up operation of the Early Modern plays in the TCP archive. Nicole Sheriko helped them in the first week and has since then focused on preparing a Young Scholar Edition of Fair Em. The clean-up operation proceeds...

Shakespeare His Contemporaries: The corpus

  Here is a link to a spreadsheet of the SHC corpus. The plays are ordered by known error rate, from low to high.  For each play, the spreadsheet shows Its filename, derived from the filename in the TCP collection the author the title the date, which is the best estimate of creation rather than...

What is a Young Scholar edition

A Young Scholar edition is a project that fits into the scale of an undergraduate honors project. It will normally take as its point of departure a TEI-P5 version of a TCP text that has been linguistically annotated with MorphAdorner. Like ice skating, it consists of compulsory figures and a free form routine. The two...

How to fix 60,000 errors

  In the ~600 plays that are the target of Shakespeare His Contemporaries (SHC) there are at least 60,000 errors that should be fixed.  Natural Language Processing folks (NLP) may observe at this point that 60,000 errors amount to 0.4% of the total corpus and that they are not worth fixing because any statistical routine...

Shakespeare His Contemporaries

The project “Shakespeare His Contemporaries” will make a systematic effort to harness the energy and imagination of undergraduates as editors and explorers of old plays in new forms. It will begin on Monday June 24 2013, when Hannah Bredar, Madeline Burg, Melina Yeh, Nayoon Ahn, and Nicole Sheriko start an eight-week curation marathon during which...

Getting undergraduates and amateurs into the business of re-editing our cultural heritage for a digital world

The following reprints an earlier post of an entry that I first published on January 7, 2011 on my now defunct “Literary Informatics” site.   The Chicago section of today’s New York Times has an article with the title “Volunteers at Planetarium Excel where machines lag.” The gist of the article is in these paragraphs:...

Morgenstern’s Spectacles or the Importance of Not-Reading

 [I recently stumbled across the draft of a talk I gave at the University of London in 2008. It strikes me as a still relevant reflection on what then I called “Not-Reading” and now prefer to call “Scalable Reading.” I reprint it below with very minor corrections and additions.] Coming from Homer: the allographic journey...

“Fluent in Marlowe”: Emily’s and Sasha’s successful adventures in data curation

The following is a reposting of excerpts from  a 2009 report by two undergraduate students of mine,  Emily Anderson and Sasha Puchalla.  As part of a course assignment, they checked the  TCP EEBO transcription of Marlowe’s Tamburlaine. They worked from a spreadsheet with a ‘verticalized’ representation of the text in which every word was a...