Applying distributional semantics to trace conceptual change

Here is the abstract of a talk I gave at the AIUCD conference in Rome in January 2017.

screen-shot-2017-01-26-at-10-29-14What we talk about when we talk about concepts – Applying distributional semantics on Dutch historical newspapers to trace conceptual change

Word embeddings – vector representations of words that embed words in a so-called semantic space where the vectors of semantically similar words lie close together – are increasingly used for semantic searches in large text corpora. Word vector distances can be used to build semantic networks of words. This closely resembles the notion of semantic fields that humanities scholars are familiar with.

We have previously shown how word embeddings, as produced by a popular implementation word2vec, can be used to trace concepts through time without the dependency of particular keywords (Kenter et.al. 2014). However, there are two main challenges that come with the use of word embeddings to represent concepts and conceptual change for the study of history. Firstly: commensurability. The use of computational techniques like word2vec demands choices of practical or technical nature. How do we legitimize these choices in terms of conceptual theory? Secondly: dependency on data. Do the results of word embedding techniques provide insights into real conceptual change, or do they merely reflect arbitrary biases in the underlying data?

Both challenges illustrate the need for critical reflection now that advanced computational tools are adopted in historical scholarship. Based on concrete examples, we will show how we dealt with these challenges in our research.

  • Tom Kenter, Melvin Wevers, Pim Huijnen, Maarten de Rijke (2015), “Ad Hoc Monitoring of Vocabulary Shifts over Time”, In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management (ACM, New York) pp. 1191-1200
  • Melvin Wevers, Tom Kenter, Pim Huijnen (2015), “Concepts Through Time: Tracing Concepts in Dutch Newspaper Discourse (1890-1990) Using Word Embeddings”, In: Digital Humanities 2015 (Sydney)
  • Carlos Martinez-Ortiz, Tom Kenter, Melvin Wevers, Pim Huijnen, Jaap Verheul, Joris van Eijnatten (2016), “Design and implementation of ShiCo: Visualising shifting concepts over time”, In: Proceedings of the 3rd International Workshop on Computational History (HistoInformatics) (CEUR Workshop Proceedings 1632) pp. 11-19.

You can find the slides of this talk on Slideshare.

Geef een reactie

Vul je gegevens in of klik op een icoon om in te loggen.

WordPress.com logo

Je reageert onder je WordPress.com account. Log uit /  Bijwerken )

Facebook foto

Je reageert onder je Facebook account. Log uit /  Bijwerken )

Verbinden met %s

%d bloggers liken dit: