Follow the instructions below in an RMarkdown document. You should use the RMarkdown template provided to you, rmarkdown-template.Rmd. Don’t forget to put your name and student number where indicated, and to update the title. Please make sure that your knitted document includes the R code chunks (so don’t use the “echo=FALSE” option). We want to see your code and the results! You should suppress unnecessary messages and warnings using the appropriate code chunk options. When you’ve answered the questions, save your .Rmd file and knit the RMarkdown document to produce a nicely formatted html document. To complete the assignment, submit BOTH your RMarkdown (.Rmd) file AND the knitted .html file via canvas.
Your instructions are:
Copyright By PowCoder代写 加微信 powcoder
Download the most recent (September 2022) public use micro data from the Labour Force Survey (LFS) in .csv format from https://www150.statcan.gc.ca/n1/pub/71m0001x/71m0001x2021001-eng.htmLinks to an external site.. Note that the file you download will be a compressed .zip file. When you uncompress the .zip file, you should find a folder containing three files. The file called pub0922.csv contains the data in .csv format. The file called LFS_PUMF_EPA_FGMD_variables.csv contains a list of all variables in the file, along with a bit of information about each of them. The easiest way to view that list of variables is to open LFS_PUMF_EPA_FGMD_variables.csv in a spreadsheet like Excel.
Load the tidyverse library.
Read the LFS data into R.
Write a brief paragraph describing the data, where you got it from, what an observation represents, how many observations there are, and what type of information is contained in the data (just in general, you do not have to describe each and every variable!). In other words, communicate to your reader what data you are working with. This should be in your own words and not copy-and-pasted from the website.
Use the codes for the LFSSTAT variable described in LFS_PUMF_EPA_FGMD_variables.csv to create a new variable LFStatus that describes the labor force status of each individual using words (e.g., the values of this variable should be “Employed, at work”, “Employed, absent from work”, etc.). I recommend using the if_else() function (see the Week 5 RMarkdown lecture notes for an example).
Create a bar chart that depicts the proportion of observations in each labor force status category; see SOCVIZ Fig 4.8 for an example. The proportions should sum to one, and the categories should be labeled properly (e.g., “Employed, at work”, “Employed, absent from work”, etc.), just like your LFStatus variable. Don’t forget to label your axes and give the plot an informative title. Write a few sentences comparing the number of observations in each category; do the results surprise you?
Create a new variable that describes each individual’s immigrant type using words (e.g., “Immigrant, landed 10 or less years earlier”, “Immigrant, landed more than 10 years earlier” or “Non-Immigrant”). Once again, you’ll want to use the codes for the IMMIG variable described in LFS_PUMF_EPA_FGMD_variables.csv to do this. Then, create a stacked bar chart (like Fig 4.11 of SOCVIZ) or a dodged bar plot (like 4.13 of SOCVIZ) that shows the proportion of observations in each labour force status for each immigrant type. That is, for each immigrant type, the proportion of observations in each labor force status category should sum to one. Write a few sentences describing any similarities/differences you see in the distribution of labor force status of the immigrant types. Note: make sure your category labels are easy to read! You might find it helpful to use coord_flip(), or rotate the axis labels using theme(axis.text.x = element_text())
Now filter the the data to keep only individuals who are “employed, at work”, and then filter again to retain only some subgroup of employed individuals that are of interest to you (e.g., could be a particular gender, province, age, occupation, industry, … any combination of things). Justify the subgroup you are interested in. Make an effort to ensure that your choice of subgroup doesn’t overlap with others in the class. For your chosen subgroup, visualize the distribution of some continuous work variable (could be work hours, earnings, unemployment duration, … many other options) through (a) a histogram, (b) a density plot, and (c) a box plot. Discuss what you see and learn from each.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com