Scalable Reading

dedicated to DATA: digitally assisted text analysis

...the broad circumference
Hung on his shoulders like the Moon, whose Orb
Through Optic Glass the Tuscan Artist views
At Ev’ning from the top of Fesole,
Or in Valdarno, to descry new Lands,
Rivers or Mountains in her spotty Globe.
(Paradise Lost, 1. 286-91)

Latest entries

Hobbes and Maggie Haberman about Twitter

Some years ago I read Joel Spolsky’s very funny description of Twitter in which he said: Although I appreciate that many people find Twitter to be valuable, I find it a truly awful way to exchange thoughts and ideas. It creates a mentally stunted world in which the most complicated thought you can think is one sentence long....

TCP2ESTC

Introduction and Summary This is a report about an experiment with ~ 4,000  texts from the Text Creation Partnership ( TCP). It is more in the spirit of concept cars than production models.  There may also be an aspect of changing fro 5.25 to 3.5 floppy disks.  The TCP texts are a critical component of...

Fixing the Blackdot Words in the TCP corpus: a “mixed initiative” in Engineering English

This is a report on a “mixed initiative”–a term of art in computer science–that  combines old-fashioned philological elbow grease with new-fangled long short-term memory neural network processing (LSTM).  The goal is to fix as many as possible of the approximately five million incompletely transcribed words in the 1.7 billion word TCP corpus of English printed...

Machine Learning in the Enterprise and in English Departments

Fifty years ago resistance to theory was a common thing in English Departments. Today there is a lot of resistance to things digital if they go beyond using a word processor, dressing up text with pretty pictures, or doing “Media”, as if text by itself were not a medium, and a very challenging one at...

Engineering English: Machine-corrected TCP texts

Engineering and English are alphabetical neighbours in a university list of disciplines, but the members of those disciplines tend to think of the other as on the other end of the disciplinary spectrum. But work in English departments has for centuries depended on the engineering work that created and refined printing.  Future work will depend...

What is a digital combo?

How should an old book live in the digital environment of the 21st century? My answer is “as a digital combo that brings together three data streams, each a surrogate that represents and contextualizes aspect of the original object. Call them the bibliographical, material, and textual streams. This scrawny diagram illustrates their interaction in the...

Whither TEI? The Next Thirty Years

In the next fifty years the entirety of our inherited archive of cultural works will have to be re-edited within a network of digital storage, access, and dissemination (Jerome McGann, 2001) You have to put the corn where the hogs can get at it (Bill Clinton) Only the paranoid survive (Andrew Grove)   Introduction At...

Freebo, Free Lunch, and Crowdfunding New EEBO Images

Here is a prefixed postscript (April 18, 2016) to my December 2015 blog post about creating new EEBO images: in a recent conversation with Thomas Stäcker, the deputy director of the Herzog-August-Bibliothek in Wolfenbüttel (HAB), I learned that their average cost for creating a digital image good enough for most scholarly purposes is about a dollar...

New release of Shakespeare His Contemporaries

I have put a new version of Shakespeare His Contemporaries on Google Drive, where you may or view or download the plays. In this version I have grouped the plays by decades and put them in directories with names like 155, 156 …165. The plays have been encoded in TEI  Simple. The texts are in...

Hannah, Kate, and Lydia at work

While reviewing the work of Hannah, Kate, and Lydia, I enjoyed the precision and concision of their annotations. A sample of them appears below. While a full documentation would require snippets of the image and the transcription as well as the annotation, the annotations themselves clearly show their minds at work, combining clear description with...

Thou com’st in such a questionable shape: Data Janitoring the SHC corpus from the perspectives of Hannah, Kate, and Lydia

  Below are the reflections of Hannah Bredar, Kate Needham, and Lydia Zoells about their adventures in the mundane world of Lower Criticism,  about which I wrote in an earlier blog and of which the digital surrogates of our cultural heritage will need a lot in the decades to come.  Racine observes in his preface...

Shakespeare His Contemporaries (SHC): The next release

This is a progress report on the basic clean-up of the 504 plays in my current Shakespeare his Contemporaries corpus (SHC).  I hope to release an updated corpus  by the end of November. It will replace the current corpus at https://github.com/martinmueller39/shc The SHC texts are partially curated versions of the TCP texts, which  have “known...