In their excellent chapter on the use of digital data in historical research, Frederick W. Gibbs and Trevor J. Owens distinguish between two DH approaches to data. ‘Data’, they argue, ‘does not always have to be used as evidence. It can also help with discovering and framing research questions’. On the one hand, you have ‘complex statistical methods’ and ‘rigorous mathematics’ (or ‘mathematical rigor’) to ‘support epistemological claims’. Gibbs and Owens equal this type of DH research to the wave of quantitative history in the 1960s and 1970s, using data ‘for quantifying, computing and creating knowledge’.
On the other, there is a ‘fundamentally different’ form of using data – a form that is exploratory instead of analytic and deliberately without the mathematical complexity that is needed to derive evidence from quantitative analyses. Above all, it’s a form of data manipulation that can be playful (although the authors removed the adjective at one of the places it appeared in their text). Gibbs and Owens state that ‘playing with data – in all its formats and forms – is more important than ever.
The divide between these two types of data research is made by most scholars reflecting on the Digital Humanities. Here you have new digital methods to answer ’traditional’ research questions – albeit with the use of much more data and in a more rigorous and, therefore, perhaps more convincing way. There your have digital techniques to approach data in a fundamentally new way – not to use it as evidence, but to see whether the sheer quantity of the data enables you to see things you otherwise would not be able to see. It’s the same dichotomy Peter Haber described in his contribution to the same volume:
“In the last few years everything that historians did on the net had an auxiliary nature: not a single tool of the last fifteen years challenged the epistemological core area of historiography. They all touched auxiliary activities such as searching, preparing or publishing historical information on- or offline. Data Driven History–or in a broader sense Digital History has the potential to change that.”
The ‘data driven’ approach (Victor Mayer-Schönberger and Kenneth Cukier call it “letting the data speak”) paves the way to handling data in a playful manner. After all, the goal of this playing or experimenting with data – of “screwing around”, in the words of Stephen Ramsay – is not to present evidence that is able to pass peer review criticism. It is to get surprised as much as it is to gain insight – as far as it enables you to get a better grasp of the data at hand. I’m very sympathetic to this approach, and considering the spread of the play-motif (see, for example, the 2010 ‘Playing with technology in history’ conference) I’m not the only one.I’ve discussed this with my students in the DH-course I teach. To illustrate what ‘playing with data’ can mean, I have shown them this picture.
In both DH research and in adventures like the unmatched Secret of Monkey Island it’s all about interacting with a digitally created environment. You look at stuff, try to push or pull it, you pick something up here, you put it down there and see what happens. And although you initially fail to see the logic behind it, you combine your rubber chicken with the pulley you found and – surprisingly enough – it seems to do something. That’s not very unlike exploratory searching with digital tools. You have your data and you use your tools to cut it up and put it back together in another way, to combine things or to extract elements from them.
The picture above, for example, is a screenshot from the ngram viewer for the Dutch National Library digital newspaper repository. It shows the relation between the use of the Dutch equivalent for ‘typically American’ and ‘typically German’ in Dutch newspapers in the 20th century. What does it mean? That’s another question. At least, it’s obvious that the Dutch used both phrases equally often before WWII, whereas they thought a lot more things to be ‘typically American’ in the postwar era than they found things ‘typically German’. However, the point of playing is, that it does not necessarily and directly have to lead to meaningful things (and it certainly does not produce things that are in themselves ‘meaningful’).
I have taken this example from my own research on the US as a model for Dutch business and economy in the 20th century. Here’s another one. Below are two histograms showing the use of the Dutch term ‘productiviteit’ (productivity) in nationwide newspapers compared to the use of the same term in regional and local Dutch newspapers throughout the 20th century. Both types of newspapers start reflecting on the term as a hallmark of Marshall Plan politics in the postwar years. Before WWII, however, the nationwide (metropolitan) newspapers were significantly earlier in adopting the term than was the regional and local press. Within my research focus on the spread, context and meaning of concepts of economic Americanization in the Netherlands, this could be something worthwhile diving into a little deeper.
What I like in particular about the comparison with games like Monkey Island, though, is that it underscores the central importance of design in DH research. The design of the tools and visualisations determines the way you can play with your data in an essential way. Do you visualise your outcomes in a graph or in a cloud? Generally speaking, the yielding of word clouds from datasets is a relatively unsophisticated form of visualisation. Still, the agility and flexibility of the design is vital for the way you can work with word clouds: are you easily able to remove stopwords, extract Named Entities, show co-occurrences etcetera?
And there’s another aspect to the design of digital tools. As the way the world of Monkey Island is designed determines which doors or alleyways Guybrush Threepwood can enter or not, the design of visualisations necessarily directs the focus of the researcher. It may be subtle, but it is unavoidable. Take word clouds again. They show a distinct number of words, in a certain arrangement, with particular sizes and – perhaps – specific colours. All of these elements are more or less contingent. Obviously, the size differences between words indicate their relative difference in frequency in the dataset, but the dimension of difference is not an absolute given. At the same time, these elements combined steer the interpretative gaze of the researcher. He necessarily fails to notice at all words that did not make it in the cloud – be it for no more than arbitrary reasons. That visualizations of data contain rhetorical power is something we better be aware of.
This is where the comparison between adventure games and DH research falls short. The Secret of Monkey Island has a underlying narrative structure and a very clear goal. It is this context that decides upon the meaning and usefulness of things like rubber chickens with a pulley in the middle (well, you have to cross that gorge to Hook Isle somehow, don’t you?). Data visualisations in humanities research lack such a given structure. This may sound like stating the obvious, but the promise of excavating ‘the’ hidden patterns of data is part of both the excitement and the skepticism surrounding text mining in humanities research. As Barry C. Smith recently stated in his excellent post ‘Big Data in the Humanities: The Need for Big Questions’: ‘there’s a danger of hoping that we can go from large unstructured data sets to meaningful insights by relying on visualization, but visualizing techniques are not, alone, the answer.’
In my view, only the domain knowledge of the researcher is able to provide this answer. It is his or her expertise that decides upon the usefulness of the quantitative analyses that DH techniques generate. The researcher decides upon not what ‘right’ or ‘wrong’ combinations (correlations) are, but what is logical and what alogical, what is supported by facts and what is speculative, what is enlightening and what mystifying. It is professional playing – but still playing. The data may provoke him to make his own rubber chickens with pulleys in the middle to dive deeper into his object of study – as long as he is able to argue why this is meaningful.
This is why I don’t agree with Stanley Fish, in spite of the eloquence with which he has written his critique of the ‘data driven’ approach. According to Fish, you have to start from a hypothesis to be able to analyse your data visualisations by scholarly standards. He fears that everything becomes contingent if you don’t know beforehand what you’re looking for. He makes it very clear that this couldn’t result in anything that he considers sound scholarship: ‘I practice […] a criticism that insists on the distinction between the true and the false, between what is relevant and what is noise, between what is serious and what is mere play. Nothing ludic in what I do or try to do. I have a lot to answer for.’
For me, this view is too negative and a bit myopic. I am trained to be both critical and receptive to unexpected associations, surprising connections and serendipity. I have learned a lot from playing The Secret of Monkey Island (and its successors) – as I am now learning more about my research subject everyday by playing with data.