This site continues a a site on Literary Informatics that I kept at Northwestern University. I will import some posts from that earlier site with appropriate revisions. The following paragraphs are a slightly edited version of the splash page for the earlier site.

DATA is an acronym for ‘digitally assisted text analysis’. The operative word here is ‘assisted’. There is no claim that doing things digitally is a finer or cooler thing than plain old reading. There is merely a claim that in some situations the digital manipulation of texts comes in handy and lets you do things that otherwise would take much longer or might be impracticable altogether. Whenever it does, texts for a while become ‘data’ — a word that grates on the humanist’s ear, even though it is a perfectly good and simple Latin word for the ‘given’. I have been told that quite recently a French medievalist said to his PhD student  that “L’ordinateur est un instrument de déshumanisation de la recherche et de la désincarnation du vivant.” But if we speak of the ‘disincarnation of the living’, reading and writing are much greater sins than digital textuality — a fact known not only to Plato in the Phaedrus but to Shakespeare’s peasant rebel Jack Cade in his indictment of Lord Say:

Thou has most traitorously corrupted the youth of the realm in erecting a grammar school … It will be proved to thy face that thou hast men about thee that usually talk of a noun and a verb and such abominable words as no Christian ear can endure to hear. (2 Henry VI 4.7.30-37)

Whatever doubts one may harbour over the digital manipulation of texts, it will not do to think of it as a technologizing of something that should not be technologized: the written word has always already been technologized, and the distance between the written and the spoken is much more consequential than the distance between the printed and the digital word. It is more a matter of a familiar vs. an unfamiliar technology, with the attendant calculus —tacit or explicit — of what is quite literally ‘worthwhile’.

To think of text as DATA in terms of this acronym is to move into the realm of  Literary Informatics.  This not yet a common term of art. A Google search retrieves just163 hits, and many of  them refer to my use of it. I did not however, coin the term. More common is the term Cultural Informatics (32,6000 hits), but this pales before Bioinformatics and its variant spellings, which add up to over ten million hits. Note that this paragraph about Literary Informatics is itself an instance of it, although a very simple-minded one.

If you think of Literary Informatics as an intriguing topic it may be more helpful to explore its relations with Bioinformatics than to think of it as a subset of Cultural Informatics. The reason is that much of Bioinformatics is a very peculiar form of text analysis, where the Book of Life is imagined as a very large text written in an alphabet of the four letters A, G, C, T, which stand for the building blocks of DNA. A human genome is a text of ~six billion such letters or ‘base pairs’ as the biologists call it.

Can one usefully think of a ‘cultural genome’? Some years ago, Peter Robinson published an article in Nature (Aug. 27, 1998) that describes the use of phylogenetic software to trace the relationships of the 58 different manuscripts of The Wife of Bath’s Prologue. The approach is rooted in the family tree as a fundamental model of thinking in the biological and philological realm. This particular example comes from the highly specialized and technical subdiscipline of textual criticism, but there may be broader ways in which literary scholars in their different ways with texts and genres can learn from the biologist’s ways with genomes and species.

This welcome message was first written two years ago. Its revision follows narrowly on  the publication on December 16, 2010, of  Google’s Ngramviewer and the essay in Science about ‘culturomics’, a term which since then has gathered 131,000 Google hits. The Germans say that “alle Vergleiche hinken” or “all comparisons are lame.” That is a good way of pointing to the paths that any comparison will not open. On the other hand, comparisons do help us walk in some directions. Only time  will tell whether metaphors like ‘meme’, ‘cultural genome’, or ‘culturomics’ are lame or ‘have legs.’ My hunch is that they will help us walk at least a little ways towards some worthwhile goals.

Martin Mueller

Professor of English and Classics, Northwestern University