Scalable Reading

dedicated to DATA: digitally assisted text analysis

...the broad circumference
Hung on his shoulders like the Moon, whose Orb
Through Optic Glass the Tuscan Artist views
At Ev’ning from the top of Fesole,
Or in Valdarno, to descry new Lands,
Rivers or Mountains in her spotty Globe.
(Paradise Lost, 1. 286-91)

Latest entries

From transcription to scholarship

Today’s New York Times carried a touching obituary of Claude Anne Lopez, author of Mon Cher Papa: Franklin and the Ladies of London   and other biographical studies of Franklin. A Jewish refugee from Nazi-occupied Belgium, she arrived in America in 1941. She married an historian who moved to Yale, where the only employment available to...

The mdash

Have you ever thought about the mdash, the long dash, \u2014 in Unicode parlance or paraphrased as — in the parlance of character entities? The odds are that you have not.  I certainly have not thought much about it, but it tripped me up this morning in the EEBO-MorphAdorner project that Phil Burns and I...

Back to the Future or Wanted: A Decade of High-tech Lower Criticism

The title of this blog entry is the title of a keynote address I gave at the Chicago Digital Humanities and Computer Science Colloqium, held November 18-19 , 2012 at the University of Chicago. There is a pdf of the talk at http://panini.northwestern.edu/mmueller/backtothefuture.pdf The talk was about the challenges and opportunities posed by the TCP...

Google maps and crowdsourcing

David Pogue has an  on his New York Times blog about what makes Google maps so good.  It’s a story of incremental and iterative improvement over years, combining sophisticated algorithms with a lot of manual work. Definite lessons for the incremental and iterative improvement over time of the TCP texts and similar corpora.

EEBO-TCP 2012: The future of the TCP as a public domain and collaboratively curated corpus of Early Modern English

“Revolutionizing Early Modern Studies?” was the question that governed the recent  EEBO-TCP 2012 conference sponsored by the Bodleian Library. I gave a talk there about “Towards a Book of English: A linguistically annotated corpus of the EEBO-TCP texts.” In another blog I will write about the ways in which this project will keep Phil Burns and...

Are the TextCreation Partnership texts good enough for research purposes

The following is a republished blog post originally published on October 1, 2009 on my now defunct Literary Informatics blog. Some of the points made then are not quite true anymore (especially my lament about the lack of concern for quality in the Hathi Trust project), but many of them remain true enough. Are the...

The Great Digital Migration

The following was first published on an earlier version of this blog in the spring of 2010. It is republished here with light revisions. I spent some time with the papers of a recent conference at Virginia: Online Humanities Scholarship: The Shape of Things to Come. Here are some comments on them, with quotations from...

Cities, Population Growth, and Literary Attention

As part of an ongoing project, I’ve been examining the distribution of named places in nineteenth-century American fiction. (See the link for details; briefly, my corpus contains 1000+ American novels published between 1851–75.) One of the areas I’m trying to understand is the driving forces behind literary-geographic attention. In short, why are some places written...

Collaborative curation of Early Modern plays by undergraduates

  The following is an abridged and lightly edited version of a blog entry that I first posted in March 2010 on my now defunct Literary Informatics blog. Here is a small but potentially promising experiment with a group of undergraduates in a Shakespeare class that I taught in the winter of 2010. Its s...

“Fluent in Marlowe”: Emily’s and Sasha’s successful adventures in data curation

 In 2009  Emily Anderson and Sasha Puchalla, two undergraduates  in a course on Early Modern drama I taught then collaborated on acourse assignment to to check and correct the TCP EEBO transcription of Marlowe’s Tamburlaine. They worked from a spreadsheet with a ‘verticalized’ representation of the text in which every word was a data row...

Very briefly: scalable reading.

I’m mostly writing simply to thank Martin for organizing this blog. I hope it becomes a venue for informal discussion of opportunities and challenges in the field. I think we could use a forum a little less volatile than twitter and more public than e-mail, but less formal than publication or even a research blog....

Against the terms DH or Digital Humanities

As the date for this year’s DH conference approaches, I would like once more to express my  dislike of the terms Digital Humanities and DH. I write this not from the perspective of somebody inside this particular  tent, but as a faculty member in a standard humanities department who has tried (with very varying success)...