Problem Set 1 Probability and Statistics
You are allowed to ask each other for help but you should write your unique answers. With your answers, submit the names of the people you helped and of the people who helped you. Include all results, plots, tables, and the code you ran for problems 1-3 (if you use R Markdown it will be easy to do so). Include every step of your reasoning / calculation in every task. Submit your answer via canvas.
You are interested in finding patterns in pollution data. The data is taken from EUROSTAT1 and contains quantities in thousand tons of greenhouse gas emission for different sectors of the economy from different countries for 1997 and 2017.
Start by downloading emissions.csv from canvas and opening it in RStudio.
The data set contains records of 1000 tons of greenhouse gasses emitted through
• energy consumption (energy)
• fuel combustion in public electricity and heat production (pubelh) • fuel combustion in cars (cars)
• industrial processes and product use (industry)
• aerosol use (aero)
• agricultural production (agriculture)
• landuse and forestry (landuse)
• waste management (waste)
• other sectors (other)
(Of course, fuel combustion is also a part of energy consumption and aerosol use is also a part of product use.)
1 https://ec.europa.eu/eurostat/web/climate-change/data/database
1
Problem 1 (20%) Use graphs and tables to show how greenhouse gas emis- sion was distributed between sectors2 in the Netherlands, Hungary and another country of your choice (if you work in a group choose different countries) in 2017. Interpret them briefly. How could you compare the statistics of the three countries? [max 300 words]
From this point on use greenhouse gas emission (1000 tons) per 100000 capita instead of the original emission variables. In the following task you will be asked to produce statistics for a specific good. The good (variable) you need to use will depend on the last digit (the last numerical entry) of your NEPTUN code:
• 1, 2, 3: Aerosol usage (aero)
• 4, 5, 6, 7: Fuel combustion in cars (cars)
• 8, 9, 0, or no number in NEPTUN code: Fuel combustion in public elec- tricity and heat production (pubelh)
Problem 2 (20%) Calculate the mean and the median greenhouse emission per 100000 capita for both years. What kind of underlying distribution do you suspect based on these values? Plot histograms and box-and-whiskers diagrams to check if you were right. How did emissions change between the two years? Discuss briefly. [max 300 words]
Problem 3 Calculate the correlation between the variables energy, industry, and waste for 2017. Interpret the results and give a possible economic explana- tion. (15%)
Let K be the first number (numerical value) of your Neptun code and L be the total count of non-numeric (letter) characters in your code. If there is no number in your code, let K = 10. For future reference, let N = K + 2 · L + 5.
Problem 4 (15%) The Animal Kingdom is going to have an election later this
year. There are 3 king candidates: the Lion, the Gorilla, and the Sloth. We
know that K · 100 percent of voters prefer Lion the most, while 5 · 100 want NN
to vote for Gorilla and 2·L for Sloth. We also know that the voters of Lion and N
Gorilla are proud and politically active, if we ask them 80% of Gorilla and 65% of Lion voters will tell us who they want to vote for, while Sloth voters are very lazy and only 20% will bother to answer our question.
1. Calculate the probability that a randomly chosen person votes for either Lion or Gorilla.
2. Calculate the probability that a randomly choosen person turns out to be a responding Sloth voter.
3. Calculate the probability that a randomly chosen person responds.
2energy, industry, agriculture, land use, waste, and other 2
4. If we took a survey of the whole population, what outcome would we predict for the upcoming elections? Would it be precise? Why or why not? Discuss very briefly.
Problem 5 (15%) In an oral exam there are N exam topics in total: K + 2L easy ones and 5 difficult ones. Two students, Anne and Dani come into the examination room at the same time and draw one topic each. (They flip a coin to determine who draws first.) Let A denote the event when Anne draws first, B denote the event when Anne draws an easy topic, and C denote the event when Dani draws an easy topic.
1. CalculatePr(B).
2. Are A and B independent? 3. Are B and C independent?
Problem 6 (15%) A new infectious disease was discovered in Neverland. In
the first two weeks the disease is easy to treat but the first symptoms only
occur during the third week of infection. Good news is that the scientists of
Neverland developed a test that can detect the presence of the virus with a 95%
accuracy among those infected, while it gives a false positive result only for 3%
of the healthy population. The authorities don’t know that currently L % of the 2
population is infected by the new disease. The total population of Neverland is N million. The test is carried out on the whole population.
1. How many people will be tagged as infected? How does this number compare to the total population?
2. What percentage of these people is actually infected and what is the per- centage of those who are mistakenly categorized as infected?
3