This blog post is the adapted conclusion from the paper ‘A Digital Humanities Approach to the History of Science.
Eugenics revisited in hidden debates by means of semantic text mining’ I wrote in collaboration with Fons Laan, Maarten de Rijke and Toine Pieters. The article was based on the research I did within the historical text mining project BILAND, as well as its predecessor WAHSP. The article is in press as part of the Proceedings of the 1st International Workshop on Histoinformatics.
In a recent blog post called ‘The Deceptions of Data’, Andrew Prescott has criticized the jubilation of the ‘digital revolution’. He states that “One of the problems confronting data enthusiasts in the humanities is that we feel a need to convince our more old fashioned colleagues about what can be done. But our role as advocates of digitized data shouldn’t mean that we lose our critical sense as scholars. [. . . ] [T]here is a risk that we look more carefully at the technical components of the datasets than the historical context of the information that they represent.”
The quintessence of Prescott’s warning is not to expect computer-generated conclusions of digital history and no machine-learned substitution for historical craftsmanship. However, this does not mean digital techniques have little value for historians. Digital tools enable historians to analyze massive volumes of texts and other big data sets and to integrate (socio-) linguistics, statistics and geo-informatics into historical research.
New techniques of large-scale data analysis allow historians to manage data sets that were only accessible by means of manual sampling. Exploratory search methods that are able to provide a quick overview combined with tools to zoom into details are especially empowering. Our proposed combination of interactive exploratory search and text mining supports historians to set up systematic search trails; the tooling helps them interpret and contrast the returned result sets: by exploring word associations for a result set, inspecting the temporal distribution of documents and by comparing selections historians can make a more informed and principled document selection.
Obviously, this is no substitute for the historical workmanship. WAHSP and BILAND are meant as heuristic tools that ideally inspire new ideas and insights that would not have been generated through reading a small number of articles, but instead are only brought forth through the analysis of hundreds of articles. These insights may help to frame new research questions, thus catalyzing historical research. Also, they stimulate serendipity. After all, digitally produced results often lead to unexpected associations that turn out promising for further research.
However, there are a number of prerequisites for the use of digital tools becoming standard procedure in historical research. First, it is quintessential that historians working with digital tools and building their arguments on digital results are highly aware of what they are doing. This may sound obvious, but it is hardly always the case. Historians should have a clear understanding of, for example, what word clouds are standing for. Or of how to translate complex queries into normal, everyday language.
They should be able to interpret and explain text mining research results in formulations such as, ‘within the given source material, in all articles containing word x and word y, word z also appears with a significant frequency’. This makes their arguments transparent. As long as digital tools are treated as black boxes, with queries going in and several sorts of visualizations mysteriously coming out, the assessment of the results remains problematic.
It is therefore that Gibbs and Owens argue that “[t]he processes for working with the vast amounts of easily accessible and diverse large sets of data suggest a need for historians to formulate, articulate, and propagate ideas about how data should be approached in historical research”. In parallel, a thorough understanding should be developed of the search behavior of historians (in the same vain as is done in this article).
It is essential that the status of the results from digital tools is clearly communicated. Evidently, tools like WAHSP and BILAND offer proof for certain arguments, but do not provide explanations for them. Fore-mentioned applications can show that in the Dutch public debate at the end of the 19th century, the predominant meaning of the concept of inheritance was medical, but it does not explain why.
In sum, text mining tools like WAHSP and BILAND are not built to make writing histories abundant. They are meant to trigger historians, to draw their attention to potentially interesting cases to explore. In this sense, it is evident that text mining can form a relevant addition to the historian’s toolbox outside the eugenics cases as well. It can be used to analyze trends and patterns on a much broader scale.
Pim Huijnen, Fons Laan, Maarten de Rijke and Toine Pieters, ‘A Digital Humanities Approach to the History of Science. Eugenics revisited in hidden debates by means of semantic text mining’, in: A. Nadamoto et.al. (eds.), Social Informatics. SocInfo 2013 International Workshops, QMC and HISTOINFORMATICS Kyoto, Japan, November 25, 2013 (Springer: Berlin and Heidelberg, 2014).