Topic Modeling: huh?

Topic modeling is a probabilistic, statistical method that can uncover themes and categories in amounts of text so large that they cannot be read by any individual human being. […] Topic modeling allows us to step back even further from analyzing representative articles in these topics to interpreting all of them, to supplement close readings of individual items with distant readings of tens of thousands of them.

(Uit: Robert K. Nelson, ‘Of monsters, Men – and Topic Modeling‘, The New York Times Opinionator Blog)

Topic modeling uses statistical techniques to categorize individual texts and, perhaps more importantly, to discover categories, topics, and patterns that we might not be aware of in those texts. A topic modeling program—here the impressive MALLET application developed by Andrew McCallum and others at the University of Massachusetts, Amherst—generates a specified number of topics from a group of documents. The specific topics are not predetermined by the researcher but instead emerge from the patterns uncovered by the statistical algorithm. All that is provided by the researcher is the number of topics.

(Uit: Robert K. Nelson, ‘Mining the Dispatch‘)

Lees verder