Monday, 13 January 2014

Can a Computer Algorithm Predict the Success of Books?

The Guardian reports that an assistant professor at Stony Brook University in New York claims to have created an algorithm that uses a quantitative approach to predict literary success with an accuracy rate of 84 per cent. The researchers used Project Gutenberg to identify works (poetry was also included); analyzed the literary style of the first one thousand lines of each work; and correlated the results with the number of downloads the title had received. They then identified the stylistic elements in the successful writings. They also applied their analysis to titles outside the Gutenberg database, such as works by Ernest Hemingway, Truman Capote, Philip Roth, and Dan Brown, and were able to predict successful writings at a rate of 70 per cent. (Their system was apparently confused by Hemingway's minimalist style because the algorithm depends on a "high-level syntactic structure".)

In terms of their findings, the less successful books rely on verbs that are "explicitly descriptive of actions and emotions", whereas more successful books contain straightforward verbs such as "say". The less successful books also contain a higher percentage of verbs, adverbs, and foreign words; topical words that are almost clich├ęs; and extreme and negative words.

The study also found that there is an inverse relation between "success" as defined by the attainment of literary awards and the "readability" of a work.
 
For the full text of the article from The Guardian, please click here.

If you're interested in reading the academic paper itself, please click here.

No comments:

Post a Comment