—
title: “Problem Set 4 – Part A”
output: html_document
—
“`{r, echo = FALSE}
library(“quanteda”, quietly = TRUE, warn.conflicts = FALSE, verbose = FALSE)
“`
### Unsupervised methods
In this part of the assignment, you will use R to understand and apply unsupervised document scaling.
#### 1. **Unsupervised scaling of the Irish budget corpus scaling**. Use the `data_corpus_irishbudget2010` in **quanteda** for this.
1.a) Fit a wordfish model of all the documents in this corpus. Apply any required preprocessing steps first. Use the `textplot_scale1d` function to visualize the result. (You may want to use the advanced options of this function to get a better plot than just the default one.)
What do you learn about what the dimension is capturing? You can use wikipedia to learn about the Irish parties involved in this debate to help you answer this question.
“`{r}
“`
**YOUR ANSWER HERE.**
1.b) Plot the wordfish “Eiffel Tower” plot (as in Figure 2 of Slapin and Proksch 2008), from the wordfish object. You can do this using the `textplot_scale1d` function or (even better) using the more advanced code we used in the lecture.
“`{r, fig.width = 5, fig.height = 5}
“`
1.c) Plot the log of the length in tokens of each text against the alpha-hat from `wfFit`. What does the relationship indicate?
“`{r, fig.width = 5, fig.height = 5}
“`
**YOUR ANSWER HERE.**
1.d) Plot the log of the frequency of the top most frequent 1000 words against the same psi-hat values from `wfit`, and describe the relationship.
“`{r, fig.width = 5, fig.height = 5}
“`
**YOUR ANSWER HERE.**
#### 2. **Fit the correspondence analysis model to the Irish budget speeches.**
Compare the results for the word scaled values (call it `caFit`) from the first dimension to the beta-hats from `wfFit`.
“`{r, fig.width = 5, fig.height = 5}
“`
**YOUR ANSWER HERE.**