Workshop on Computational Text Analysis
Content
In contemporary social science, we are faced with an era of big data. Political actors regularly justify their decisions on various communication channels, institutions publish policy reports, and individuals state their opinions on social media and comment sections of newspaper outlets. But how to make use of these data?
This workshop helps researchers in (1) gathering textual data from publicly accessible webpages, (2) preparing the raw material for analysis, (3) acquiring techniques to analyse the data and (4) understanding recent trends in text- and images-as-data. Thereby, the workshop is structured alongside four input sessions and 2-3 practical sessions.
You can download the syllabus here.People
Instructors | Mirko Wegemann (he/him) |
Dr. Eva Krejcova (she/her) | |
Teaching Assistant | Sara Dybesland (she/her) |
Schedule
Input session | Lab session |
---|---|
30/05/2024, 09:00-11:00 (SR 2) | 30/05/2024, 13:00-15:00 (SR 2) |
31/05/2024, 10:00-12:00 (SR 2) | 31/05/2024, 13:00-15:00 (SR 2) |
03/06/2024, 10:00-12:00 (SR 2) | 03/06/2024, 13:00-15:00 (SR 2) |
04/06/2024, 10:00-13:00 (SR 2) | No lab session (but longer input!) |
Materials
Please download the files, put them in one directory and create a .Rproj in that directory.To download the MARPOR data on your own, you can use this script. You need to register for API access at Manifesto Project before. The API key needs to be stored in a .txt-file in your directory.
Session 1: Scraping
Slides | Input session | Lab session |
---|---|---|
Slides | Replication code | Exercises Solution |
Session 2: Bags-of-words
Slides | Input session | Lab session |
---|---|---|
Slides | Replication code Data Basic STM STM with covariates Results (searchK) | Exercises EUI Theses (Data) |
Session 3: Embeddings and machine learning
For session 3, you need a local installation of Python and GloVe embeddings you can download hereSlides | Input session | Lab session |
---|---|---|
Slides | for R: Replication code Data Embeddings Matrix Addition: How to use GPT in R for Python: Transformers (Colab) Download raw file here and open in Colab Training data Test data | Script (keyATM) UK Speech Corpus |