What's a good story? Well, there's "Anna Karenina." That one tops some lists. What about the one where Jesus is born, dies and rises again? That one's very popular! And there's that time you were walking along, minding your business, and you accidentally tumbled into a manhole. People seem to get a real kick out of that one.
As Kurt Vonnegut would tell you, there are a lot of elegant stories out there that seem to follow the same emotional patterns. In 1985, Vonnegut himself suggested feeding these "simple shapes of stories" into computers to see what we discovered. In 2016, researchers finally took him up on it and found that six distinct emotional story arcs dominated the storytelling culture in the Western tradition.
So how can a computer map an "emotional arc"? Andy Reagan is a researcher and Ph.D. candidate in applied mathematics at the University of Vermont, and an author of the study. He says it starts with happiness.
"To measure happiness, we counted the 10,222 most used words in language and rated them using a survey on a 1-9 sad to happy scale, collecting 50 ratings of each word," Reagan says in an email. "The happiest words end up being 'happiness,' 'laughter,' and 'love' as you might expect."
After that, they measure how happy the words are in chunks of text. "To measure the emotional arc of a single book, we take chunks of the text that are 10,000 words long, count all of the words in that chunk, and take the average happiness of all the words in that bag of 10,000 words," Reagan says. "As we move that 10,000 word window through the text, we generate the time series of emotion in the book." And voila — you have the emotional arc of a story.
By using texts on Project Gutenberg, they narrowed down a list of 1,327 stories to analyze. They used language-processing filters that would find the underlying shapes in the arcs, and then group them in clusters of similar arcs. An artificial neural network helped figure out which arcs were part of an actual storyline.
"They all tell us slightly different things," Reagan says of the three tools used. "And we looked across each of them for the overarching result."
The data showed that six emotional story lines can be identified: 'rags to riches' (sentiment rises), 'riches to rags' (fall), 'man in a hole' (fall-rise), 'icarus' (rise-fall), 'Cinderella' (rise-fall-rise), 'Oedipus' (fall-rise-fall).
They also listed the stories that are represented by their core emotional arc, sorted on the number of Gutenberg downloads: "The Importance Of Being Earnest" dominated the rise category, "The Picture of Dorian Gray" for fall, "Dubliners," for fall-rise, Dickens' "A Christmas Carol" for rise-fall, "Treasure Island" for the rise-fall-rise and "Frankenstein" for fall-rise-fall.
But one thing Reagan points out is that they were only looking at English-language stories, which might limit our understanding about storytelling in different traditions.
"Our analysis used just post-copyright (older) English fiction, and this is a very specific corpus," he says. "We've recently obtained access to the full digitized books by the Google Books project for our research, and are eager to study how the emotional arcs vary through time and across cultures."
So what's the point of knowing how stories shape up? For one, it's enticing for artificial intelligence study. "We use stories, narrative, and ultimately metaphor to make sense of the world around us, and this could be important for common-sense reasoning in AI," Reagan says of the emotional arc research. So watch out, Dave — HAL might just know how your story ends before you do.