Looking up stuff in an Early Modern corpus

The following is a discussion of a set of “search and sort” operations that could be useful in exploring the EEBO-TCP corpus of English books before 1700. It also includes some paragraphs about making texts more computationally tractable so that search...

Engineering English: Machine-assisted curation of TCP texts

The are somewhere in the neighbourhood of five million incompletely transcribed words in the rougly two billion words of English books before 1700 transcribed by the Text Creation Partnership. Depending on how you look at it, that is either a  lot or not very much at...

Morgenstern’s Spectacles or the Importance of Not-Reading

 [I recently stumbled across the draft of a talk I gave at the University of London in 2008. It strikes me as a still relevant reflection on what then I called “Not-Reading” and now prefer to call “Scalable Reading.” I reprint it below with...

The mdash

Have you ever thought about the mdash, the long dash, \u2014 in Unicode parlance or paraphrased as — in the parlance of character entities? The odds are that you have not.  I certainly have not thought much about it, but it tripped me up this morning in the...