2022/5/20 13:38 ETC1010: Introduction to Data Analysis
ETC1010: Introduction to Data Analysis
Please write your name Assignment 2
Question 1: Read the energy data set (owid-energy-data.csv) and store it in a data object called energy (0.5pt). Read the population data set (population.csv) and store it in a data object called population (0.5pt). Show the first 4 rows of each of the data frames (1pt).
Copyright By PowCoder代写 加微信 powcoder
Question 2: Combine the energy data set with the population data set into a single data frame called energy_pop (pass on explicitely which are the variables you are using for the merging). Make sure the new data frame includes all the observations that are common in the energy and population data sets, and the variables in both data frames (1pts). Show the first 5 rows of the new combined data frame (1pt). Display the dimension of the combine data set (1pt).
Question 3: Calculate the average ¡°share of electricity generation that comes from wind¡± (look into the code book to find the variable name in the data set) for each country across all the years (you should have a single value per country) that are recorded in the data set and store the results in a data object called mean_windenergy. (2pts) Create a table where you display the first 6 countries that have highest average share of electricity generation coming from wind across all the years. (2pts)
Question 4: Which is the country with the highest total per capita electricity consumption from solar energy since 2019 (solar_energy_per_capita) (2pts). What is the total solar_energy_per_capita for that country since 2019? (1pt)
Question 5: Which is the country that has the largest electricity generation from fossil fuels, measured in terawatt-hours (fossil_electricity) in 2020? (1pt) Looking at the World electricity generation from fossil fuels, what is the percentage contribution of that country? (1pt)
Question 6: First, create a data frame called energy_pop_filtered that excludes observations with missing values of iso_code and also excludes ¡°OWID_WRL¡±, ¡°OWID_EUR¡±, ¡°OWID_EU27¡±, ¡°OWID_AFR¡± in iso_code. (2pt) Using this new data frame, display a table of the top 10 countries that had the largest electricity generation from nuclear power, measured in terawatt-hours (nuclear_electricity) in 2000. Rename the columns country and nuclear_electricity to ¡°Top 10 Countries¡± and ¡°Nuclear Power Electricity Generation (TWh)¡± respectively. Show only the country and nuclear power electricity generation. (2pts)
Question 7: Display graphically where the missing values are the population variable in the energy_pop data set? (1pt) Hint: Use only the population variable and make sure you input the right object into the function.
Question 8: Using the data energy_pop, create a figure to compare the trends in electricity demand, measured in terawatt-hours scaled by population (electricity_demand = electricity_demand/population) over time across the following countries (India, China, Denmark, and United States) from 2000. Display each country trend in a different color and use the black and white theme. (4pt) By looking at the figure, what can you conclude about the trends behaviour over time? (1pt).
Question 9: Now display the trends for biofuel_electricity, coal_electricity, gas_electricity, hydro_electricity, solar_electricity,wind_electricity in a different panel for each of the four countries (India, China, Denmark, and United States) while making sure each plot is displayed using the same scales (you cannot create the figures separately). Change the y-label to ¡°Electricity consumption¡± and a title to the figure ¡°Electricity trends¡±. (6pts)
Question 10: Using the data set energy_pop, display scatterplots for India, China, Denmark, United States showing the relationship greenhouse_gas_emissions since 2000. Display each scatterplot in a different panel of one figure and set scales to be free for each panel. (4pts)
Question 11: Using the map function, estimate the linear models for each of the four countries in Question 10. (10pts)
file:///Users/hanny/Downloads/Assignment 2 ETC1010 /Assignment2_ETC1010.html#question-1-read-the-energy-data-set-owid-energy-data.csv-and-store-it-in… 1/6
2022/5/20 13:38 ETC1010: Introduction to Data Analysis
Question 12: Create residuals plots for each of the models and discuss whether the model assumptions are fullfilled. (4pts)
Question 13: Display the model coefficients for each of the models you fitted in Question 11 in a table where the country name is in first column, the intercept is in the second column and the slope is in the third column. Interpret each of the slope coefficients. (8pts)
Question 14: Calculate goodness-of-fit measures ((r.squre, BIC and AIC)) for each of the models you estimated in question 11 using map. Display the value for each of the models you fitted in Question 11 in a table together with the country name in the first column. Interpret the values. Make sure you round the values to two decimals. (6pts)
In this assignment, we are going to study energy consumption and green house emissions for a number of countries. To do that, we are given two data sets, owid-energy-data.csv and population.csv, that contain information about a number of countries over time.
You can find the data sets and a code book for the assignment in the folder called Data.
In addition to the marks displayed in each question, an additional 15 points have been allocated for
assessment of general coding style and overall performance.
Please ensure that the report knits properly into html and all the R code and R outputs are visible in the final knitted report. You will need to save your rendered html document into a pdf file (you can use your internet browser to print your html file into a pdf file) and upload that pdf file into Moodle for submission.
This is an individual assignment and you must use R code to answer all the questions. Make sure that you have your messages and warnings turned off before you submit the exam (see lines 15-17 of this Rmd file) and echo = FALSE set for the R code chunk where you load your libraries.
Question 1: Read the energy data set (owid- energy-data.csv) and store it in a data object called energy (0.5pt). Read the population data set (population.csv) and store it in a data object called population (0.5pt). Show the first 4 rows of each of the data frames (1pt).
Question 2: Combine the energy data set with the population data set into a single data frame called energy_pop (pass on explicitely which are the variables you are using for the merging). Make sure the new data frame includes all the observations that are common in the energy and population data sets, and the variables in both
file:///Users/hanny/Downloads/Assignment 2 ETC1010 /Assignment2_ETC1010.html#question-1-read-the-energy-data-set-owid-energy-data.csv-and-store-it-in… 2/6
2022/5/20 13:38 ETC1010: Introduction to Data Analysis
data frames (1pts). Show the first 5 rows of the new combined data frame (1pt). Display the dimension of the combine data set (1pt).
Question 3: Calculate the average ¡°share of electricity generation that comes from wind¡± (look into the code book to find the variable name in the data set) for each country across all the years (you should have a single value per country) that are recorded in the data set and store the results in a data object called mean_windenergy. (2pts) Create a table where you display the first 6 countries that have highest average share of electricity generation coming from wind across all the years. (2pts)
Question 4: Which is the country with the highest total per capita electricity consumption from solar energy since 2019 (solar_energy_per_capita) (2pts). What is the total solar_energy_per_capita for that country since 2019? (1pt)
The country with the highest total per capity electricity consumption from solar energy since 2019 is (use inline code)
Paste here code inside the inline code:
Question 5: Which is the country that has the largest electricity generation from fossil fuels, measured in terawatt-hours (fossil_electricity) in 2020? (1pt) Looking at the World electricity generation from fossil fuels, what is the percentage contribution of that country? (1pt)
The country with largest electricity generation from fossil fuels, measured in terawatt-hours in 2020 was
file:///Users/hanny/Downloads/Assignment 2 ETC1010 /Assignment2_ETC1010.html#question-1-read-the-energy-data-set-owid-energy-data.csv-and-store-it-in… 3/6
2022/5/20 13:38 ETC1010: Introduction to Data Analysis
Paste here code inside the inline code:
¡ª- (inline R code) contributes (¡ª-) (inline R code) of the world electricity generation from fossil fuels. Paste here code inside the inline code you use you in the sentence above:
Question 6: First, create a data frame called energy_pop_filtered that excludes observations with missing values of iso_code and also excludes ¡°OWID_WRL¡±, ¡°OWID_EUR¡±, ¡°OWID_EU27¡±, ¡°OWID_AFR¡± in iso_code. (2pt) Using this new data frame, display a table of the top 10 countries that had the largest electricity generation from nuclear power, measured in terawatt-hours (nuclear_electricity) in 2000. Rename the columns country and nuclear_electricity to ¡°Top 10 Countries¡± and ¡°Nuclear Power Electricity Generation (TWh)¡± respectively. Show only the country and nuclear power electricity generation. (2pts)
Question 7: Display graphically where the missing values are the population variable in the energy_pop data set? (1pt) Hint: Use only the population variable and make sure you input the right object into the function.
Question 8: Using the data energy_pop, create a figure to compare the trends in electricity demand, measured in terawatt-hours scaled by population (electricity_demand = electricity_demand/population) over time across the following countries (India, China, Denmark, and United States) from 2000. Display each country trend in a different color and use the
file:///Users/hanny/Downloads/Assignment 2 ETC1010 /Assignment2_ETC1010.html#question-1-read-the-energy-data-set-owid-energy-data.csv-and-store-it-in… 4/6
2022/5/20 13:38 ETC1010: Introduction to Data Analysis
black and white theme. (4pt) By looking at the figure, what can you conclude about the trends behaviour over time? (1pt).
Question 9: Now display the trends for biofuel_electricity, coal_electricity, gas_electricity, hydro_electricity, solar_electricity,wind_electricity in a different panel for each of the four countries (India, China, Denmark, and United States) while making sure each plot is displayed using the same scales (you cannot create the figures separately). Change the y-label to ¡°Electricity consumption¡± and a title to the figure ¡°Electricity trends¡±. (6pts)
Question 10: Using the data set energy_pop, display scatterplots for India, China, Denmark, United States showing the relationship greenhouse_gas_emissions since 2000. Display each scatterplot in a different panel of one figure and set scales to be free for each panel. (4pts)
Question 11: Using the map function, estimate the linear models for each of the four countries in Question 10. (10pts)
Question 12: Create residuals plots for each of the models and discuss whether the model assumptions are fullfilled. (4pts)
Question 13: Display the model coefficients for each of the models you fitted in Question 11 in a table where the country name is in first column,
file:///Users/hanny/Downloads/Assignment 2 ETC1010 /Assignment2_ETC1010.html#question-1-read-the-energy-data-set-owid-energy-data.csv-and-store-it-in… 5/6
2022/5/20 13:38 ETC1010: Introduction to Data Analysis
the intercept is in the second column and the slope is in the third column. Interpret each of the slope coefficients. (8pts)
Question 14: Calculate goodness-of-fit measures ((r.squre, BIC and AIC)) for each of the models you estimated in question 11 using map. Display
the value for each of the models you fitted in Question 11 in a table together with the country
name in the first column. Interpret the values. Make sure you round the values to two decimals. (6pts)
file:///Users/hanny/Downloads/Assignment 2 ETC1010 /Assignment2_ETC1010.html#question-1-read-the-energy-data-set-owid-energy-data.csv-and-store-it-in… 6/6
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com