The other day I was rereading Jerry McGann’s 2009 TLS essay on “Our textual history”, and I was struck by a phrase that I had missed on the first reading and had never encountered before: August Böckh’s definition of philology as “die Erkenntnis des Erkannten” or “the (further) knowing of the (already) known.” A little Web research took me to an essay by Jürgen Paul Schwindt in the Süddeutsche Zeitung (18 December 2009) about the origin of this phrase. It was coined in 1809 by August Böckh, then 23 years old, and soon to become very famous as one of the giants of 19th century philology, in the first version of his signature project, the “Encyclopedia and Methodology of the Philological Sciences.”

What is it with age 23? Milton in Sonnet VII exclaimed

How soon hath Time, the subtle thief of youth
Stol’n on his wing my three-and-twentieth year!

and Schiller’s Don Carlos says something like “23 and done nothing yet for immortality”.

But I digress. What was it about Böckh’s phrase that arrested my attention? I saw in it a sudden illumination of the question whether and how one can use forms of “text mining” for literary analysis. I worry less about the “whether” than the “how, ” and I don’t agree with the French medievalist whom I have quoted before as saying “L’ordinateur est un instrument de déshumanisation de la recherche et de la désincarnation du vivant.” It may often work that way, but it doesn’t have to.

Still, in talking or working with people who are professionally engaged in text mining and know a great deal more about it than I, I have often been uneasy about a fundamental mismatch between what “we” do and “they” promise — leaving aside a more precise definition of just who “we” or “they” are. Once I shared my unease with a quick-witted colleague, who like me is quite interested in text mining, and he said: “Oh yes, those guys. They just go for the shortest vector.” Whatever it is we do in the humanities, we do not go for the shortest vector.

Almost two years ago, an essay in the New York Times about Exploring the Deep Web quoted the cofounder of a new searching as saying “Most search engines try to help you find a needle in a haystack, but what we’re trying to do is help you explore the haystack.” That spoke to some aspects of my unease. I was reminded of it today in a story in the New York Times about twenty-somethings who get up as early as Abraham did to stay ahead of the news cycle and brief their bosses by 8 am. One of them is quoted as saying:

“It’s reading the 1,000 stories in the papers and Hill rags, and finding that one needle in the haystack that’s going to matter.”

I’m not sure whether that is an exhilarating or miserable way of spending your time (probably both), but it also is not what humanists do most of the time, even if detective work of one kind or another may be an important part of our jobs. Nor are we typically in the situation of the computer scientist who gave a talk at the University of Chicago several years ago and described her experience at ATT, where seven gigabytes of data came across the transom every day and you had to do something about them today because tomorrow there would be another seven gigabytes (That was about a decade ago when seven gigabytes was still a lot of data).

In search scenarios of this kind, you find yourself in the midst of tons of data about which you know nothing and need to find something that plausibly completes the sentence “the bottom line is that…” But this is not the typical scenario for humanist scholars in their engagement with the primary data around which their projects revolve. They already know a lot about the object of their inquiry, but they want to know more. In most cases, they are unlikely to find much use for the techniques, subtle and crude at the same time, by means of which text mining extracts something from the unknown.

I have quite a few times been in a situation where a text miner told me about the result of some text analysis operation where I knew something about the data, and I was usually too polite to quote Horatio:

There needs no ghost, my lord, come from the grave
To tell us this.

Böckh’s marvelous phrase draws our attention to the most characteristic aspect of the interpretative labour that is at the heart of Literary Studies and similar disciplines. It is a matter of knowing more about things that we already know something about. There is no bottom line to this knowledge. Or, as Böckh put it in another famous sentence: “Die Philologie ist, wie jede Wissenschaft, eine unendliche Aufgabe für Approximation.” (Philology is, like every other science, an unending task of approximation”).

It might not be a bad idea to write those two phrases by Böckh in large letters on the door of any developer or designer or project director whose goal it is to make some tool that will help readers with the task of making sense of what they read — the knowing of the known as an unending task of approximation. And with the right graphics they might add up to an inspiring screen saver. But developers, designers, and project directors who take those two phrases to heart in creating software for humanists will do a much better job.