Opinion: by combining computer science with critical analysis, we can gain a more complete picture of literary history
You're probably familiar with the concept of close reading, the sustained and detailed analysis of a short passage of text such as a poem or a short piece of prose or drama. A forensic activity, close reading emphasises close attention to words, grammar, and syntax as a means of exploring and articulating the meaning the reader gleans from the text.
On the other hand, distant reading is a method of literary criticism that uses computational and data-analysis techniques to identify meaningful patterns within large collections of texts. Unlike close reading, the object of analysis is often a collection of hundreds or thousands of texts that no individual could read within the span of a lifetime.
But in using a computer in this task, new vistas of research materialise. For example, we can analyse literature with the evidence of a fuller record of literary history, instead of using a small collection of established texts—the canon—to stand in for the whole of literature.
If a computer is used to identify patterns within a collection of several thousand literary texts, can this really be considered reading?
The term is usually attributed to the Italian literary critic Franco Moretti, though other scholars have identified similar methods being practiced decades before. Moretti argued that distant reading would allow scholars to gain a more complete picture of literary history by reading the masses of published literature previously ignored by readers and by academic study, which he called "the great unread" and "the slaughterhouse of literature."
Part of the logic of reading beyond the canon is to gain perspectives on works that have been lost to history, but testify to a greater significance in the period of their publication. Moreover, a history of the European novel that is told from the sole perspective of its apparent highlights such as Ulysses, Madame Bovary, War and Peace, must be partial at best.
However, if a computer is used to identify patterns within a collection of several thousand literary texts, can this really be considered reading? The reading in distant reading arises from the close collaboration between the scholar and the computer. The former approaches the collection with a hypothesis in mind, selecting or designing appropriate algorithms to enable pattern recognition, and then analysing, often with conventional literary methods such as close reading, the resulting patterns. The process is such that neither the researcher nor the computer could complete it alone.
The methods we use—combining the sciences and the humanities—are new, and will yield fresh perspectives on these important questions
Since distant reading is conducted at a large scale and results are also often wide-ranging, telling stories about the progress of literature over the course of a century or longer. Some topics that have been addressed include: how the language of novels becomes less abstract as the 19th century progresses; why the titles of novels are shorter in the 19th century than in the 18th and who (and what) is behind the Elena Ferrante pseudonym. Nor is the method restricted to literary texts: Moretti and Dominique Pestre have examined why the language of World Bank Reports becomes more abstract towards the turn of the millennium.
The objective of the Distant Reading for European Literary History project is to develop the resources necessary to change the way European literary history is written by combining perspectives from computing and literary studies. It involves a multilingual collection of European novels from 1850 to 1920 (around 2,500 novels across at least 10 different European languages) and the development of new computational tools to analyse and compare literary texts in different languages.
This truly European project brings together scholars from 29 countries to study the European novel, creating a broader, more inclusive, and better-grounded account of European literary history and cultural identity. With much distant reading scholarship focused on English-language literature, the multilingual dimension represents a significant advance for the field. The project's innovative contribution to literary studies is an ability to compare different features, styles, and patterns of development of the novel across the European continent in this period.
Science Week may not be an occasion when you expect to hear about advances in the study of literature, but this project uniting computer science with more traditional methods of literary analysis promises to do just that. The ultimate literary questions we address are similar to those that scholars have long pondered: how and why does the novel change and develop in different places and in different times? The methods we use—combining the sciences and the humanities—are new, and will yield fresh perspectives on these important questions.
The views expressed here are those of the author and do not represent or reflect the views of RTÉ