Data analysis, text mining drives literary research
English major Matthew Jockers wasn’t always a computer whiz. The new dean of the WSU College of Arts and Sciences recalls a class in high school in which he struggled to program a mainframe to print out his name. “It was that tricky,” he says.
A love of reading, writing, and literature led him to become a very good coder indeed. Jockers is an expert in R, a programming language he uses to write the pattern-detecting algorithms at the heart of his research. Jockers uses it to analyze texts—lots and lots of text. One wag wrote that Jockers may be the only literature professor to assign 1,200 novels in a single class.
A powerful tool, text mining is used by pharmaceutical companies to analyze patents and journal articles to accelerate drug discovery. Public health researchers, including some at WSU, mine social media text to detect disease outbreaks. Businesses use it to improve competitiveness by analyzing customer and consumer data. Scientists, daunted by a global scholarly output of two million papers a year, use text mining to make their work more efficient. And text mining helps us understand that most human of acts, storytelling.
So important is storytelling that Jockers, not one to go too far out on a limb, is willing to make a prediction. “If you look at the historical data, English majors have declined. My prediction is that the dip is going to reverse. There is constant new evidence coming in of the value of storytelling to every industry out there.”
When Jockers was a computer engineer at Apple, he says they needed people who could translate technical innovations into something the company could use and sell. “The mantra has been ‘STEM, that’s where the jobs are.’ But the jobs are changing” as tech industries mature, “and now we need leadership, communications, and those are developed in the arts, humanities, and social sciences.”
The Bestseller Code, a book Jockers and a colleague published in 2016, is a perfect example. The book provides empirical evidence of the features that authors, literary agents, and publishers have long sought after intuitively. By digitizing and analyzing thousands of works of fiction—some bestsellers, many not—The Bestseller Code reveals “patterns at a scale and level of granularity that no human could ever manage.”
A lot of that analysis is done with surprisingly mundane words. The most common words in English—forms of the verb “to be,” prepositions, articles—reveal all sorts of things, Jockers says, including the gender of an author, and whether he or she is, say, British or American. An analysis of nineteenth-century novels revealed that “character agency is very gendered. The results confirm certain stereotypes about gender roles.”
Despite dire warning of machines storming the castle of narrative and robbing us of our creativity, Jockers demurs.
“All the tech is doing is letting us see things that we wouldn’t see otherwise.” It’s up to storytellers, not machines, to decide how to act upon the patterns that emerge from the data.
Top image: Matthew Jockers, dean of the WSU College of Arts and Sciences. Photo by WSU Photo Services.
By Brian Charles Clark for Washington State Magazine