Text as Data | Mirko Wegemann

Fancy Table

Content

People constantly produce textual data on the Internet. Every day, political actors justify their decisions through various communication channels, institutions publish policy reports, and citizens express their opinions on social media and in the comment sections of newspapers. How can we, as political scientists, make use of such data?

This methods seminar provides an introduction to quantitative text analysis, a type of content analysis that examines texts based on numerical similarities. Over the course of the seminar, students will learn (1) how to collect text data from publicly accessible websites, (2) how to prepare raw material for different types of analyses, and (3) how to apply various techniques of quantitative text analysis. Students will develop a basic understanding of how the discipline has evolved over recent years, from simple bags-of-words approaches to more recent developments in text analysis (such as transformers or large language models). The individual sessions will be practice-oriented and will give students the opportunity to carry out their own project as part of the seminar. In doing so, they will develop their own research question, formulate theoretical expectations, gather research data, and apply an appropriate method of quantitative text analysis.

You can download the syllabus here.
The seminar takes place weekly on Wednesday at 4-6pm in seminar room 100.125. To participate in the seminar, students need to bring their own laptops/tablets.

Materials

To run the sample code, the files should first be saved locally and an .Rproj file should be created within the same directory structure. By double-clicking the .Rproj file, RStudio will open, from which the .Rmd file can then be accessed. A short guide can be found, for example, here.

Week 1: Introduction

In this week, we will talk about the structure of the seminar, its expectations and your intended learning outcomes. We may also install R and RStudio. Slides

Week 2 and 3: Basics in R

Slides	Code	Data
Slides (Week 2) Slides (Week 3)	Tutorial (Part 1) Tutorial (Part 1, Solutions) Tutorial (Part 2) Tutorial (Part 2, Solutions) Introduction Introduction (Solutions)	Stopwords Test file (.csv) Test file (.xlsx) Data MPs German Bundestag Data ParlaMint (UK)

For the "introduction.Rmd"-tutorial, the "Allbus 2018" is needed. You can download the Allbus data for free after registration at GESIS.