Scalable Reading

dedicated to DATA: digitally assisted text analysis

...the broad circumference
Hung on his shoulders like the Moon, whose Orb
Through Optic Glass the Tuscan Artist views
At Ev’ning from the top of Fesole,
Or in Valdarno, to descry new Lands,
Rivers or Mountains in her spotty Globe.
(Paradise Lost, 1. 286-91)

Early Modern Drama

Who wrote what: is it time for a blind experiment about Early Modern drama?

Natural Language Processing (NLP) has come a long way since 1982 when Anthony Kenny published The Computation of Style: An Introduction to Statistics for Students of Literature and Humanities. The methods are more sophisticated, the machines are both cheaper and more powerful, and the learning curve for carrying out experiments has dropped sharply. But this...

Shakespearean n-grams

  The following is about  86 pairwise combinations of EMD plays that include Shakespeare on one side and sit 2.5 standard deviations above the average for all pairwise combinations. This is a crude and arbitrary cut-off and includes  less than 1% of some 11,000 pairwise combinations that meet this definition. There is a “proof in the...

The Top Fifty n-gram heavy play links

This blog entry continues the entry on “Authors are trumps” and looks at the top fifty play links, which score at the 99.9th percentile of shared n-grams.  What can we learn from this list without actually looking at the plays? Or, if we think about it as an excercise in scalable reading, what can we...

Authors are trumps

What do repeated phrases or n-grams tell us about how distant from or close to each other pairs of early modern plays are?  Do n-grams provide  dependable measures of distance, and can we learn from them about the weight of various factors that differentiate between one play and another, whether by date, genre, or author?...

Shakespeare’s dislegomena

Shakespeare’s dislegomena are lemmata that occur in only two of his plays. I use ‘dislegomenon’ in a specialized sense to refer to document rather than collection frequency.  For instance, the lemma ‘Laertes’ occurs once in Titus Andronicus and 33 times in Hamlet.  In Titus Androniucs, the name occurs in the context of disputed burial, and...

The length of plays in the EMD corpus

Polonius thought that the Player’s speech was too long. Dr. Johnson said that nobody would wish Paradise Lost any longer. One might say the same of Meistersinger or Götterdämmerung. On the other hand, Robert Schumann defended Schubert’s ‘heavenly lengths’ against detractors. Size is a pretty primitive criterion, but it enters powerfully into our tacit or...