Over at New Savanna I’ve been blogging my way though Matthew Jockers, Macroanalysis: Digital Methods & Literary History, University of Illinois Press, 2013. I figured this particular post would be of interest here. If you’re not familiar wiht topic analysis, there’s some links below that’ll help you out.
Chapter 8 of Macroanalysis is about “Theme.” Jockers uses topic analysis to investigate the occurrence of 500 ‘themes’ in a corpus of 3,346 19th-century British, American, and Irish books. He opens with a bit of intellectual history, from the Russin Formalists to Google’s Ngrams; then he launches into topic analysis, which emerged at the turn of the millennium he gives some simple examples, and then he gets serious. But I’m going to skip over all of that for now.
For one thing, I’ve been through the topic analysis drill several times in the past year or so and don’t want to go through it again. If you need an introduction or a review, check out Topic Models: Strange Objects, New Worlds, or, in this series, Reading Macroanalysis 5: An Interlude on Scale: Micro, Meso, and Macro. For another, Jockers has put a topic tool online, 500 Themes from a corpus of 19th-Century Fiction. Those are the topics he discusses in this chapter.
Once I was done reading the chapter I started playing with the tool. I’d pick a topic and then look at the graphics:
- a word cloud to display the most frequent words in the topic,
- a bar chart indicating usage of the topic by author gender (male, female, and undetermined),
- a line graph showing gender usage over time,
- a bar chart indicating usage of topic by author nationality (American, British, Irish).,and
- a line graph showing national usage over time.
At first I was just browsing, moving from one theme to the next. But then I hit one that grabbed my attention.
So I spent the next couple of hours looking at themes and thinking about them. I’m going to devote the rest of this post and the next one showing what I found. Then I’ll do a third post where I review what Jockers found and recast the enterprise in terms of cultural evolution.
Note that in all of this I’m just playing around, but in a serious way. It is all preliminary and provisional. I haven’t reached any firm conclusions on the particular themes I look at. The only thing I’m sure about is that this, and similar techniques, are going to revolutionize the way we do literary history.
Before proceeding on, however, two caveats are necessary. While the Jockers’ is substantial it isn’t every British, American, and Irish novel written in the 19th Century. Perhaps more important, it is natural to read these theme charts as reflecting the interests of the 19th Century reading public. And in some sense that is so. But we have to be careful.
For some of these books were more widely read than others and a few of them, the canonical ones, are still being read. But the extent of a books’ readership is not reflected in the data. The fact that a book was published at all implies, of course, that someone thought there was an audience for it. But a publisher’s interest isn’t quite the same as a reader’s interest. We simply don’t know how accurately publisher interest tracks reader interest. With those reservations in mind, let’s take a look.
Of Dogs and Gold
In the course of browsing through Jockers’ themes menu I saw “DOGS.” Let’s look at that, I thought. Why dogs? you may ask. No deep reason, but some years ago, way back in graduate school in fact, I’d noticed that dogs figured as a significant motif in Wuthering Heights. Major transitions among humans were marked by violence between dogs and humans (e.g. Lockwood arrives and is greeted by a barking dog, Catherine gets bitten by Skulker; see this post). More recently, I’d read a handful of articles about the domestication of dogs during human evolutionary history. I was just curious.
Here’s the word cloud for the DOGS topic:
The following graph stunned me. It depicts the occurrence of the dog topic by author’s gender over the course of the century. The medium gray line depicts male authors, the black line females, and the light gray line, authors where the gender was undetermined.
What’s that spike at the right edge? As soon as I realized that it was for male authors I thought, “Jack London, Call of the Wild.” I also had some doubts as to whether that book was in the corpus, as I didn’t believe the book was 19th Century, though I wasn’t sure. But that doubt didn’t stop me from nosing around. By the time I’d confirmed for myself that it wasn’t 19th Century (it was published in 1903) and Jockers had gotten back to me that, no, it wasn’t in the corpus, I’d already had too much fun browsing through the charts and had moved on to other topics (which I’ll get to in the next post). Continue reading