, Institute for Statistics
Statistical Software (R) – term paper 3 Winter semester 2021/22, February 10, 2022 – March 7, 2022
Registrations number:
Copyright By PowCoder代写 加微信 powcoder
I hereby confirm that I have read and understood the instructions on this sheet. I confirm that the submitted solution was completely and solely edited and created by me, without the help of others. I confirm that sources other than the lecture materials, such as books or websites, are indicated in the code and linked where applicable.
Signature:
Pr ̈ufungshinweise: ̈
1. Check if the downloaded specification is complete.
3 task blocks. Individual task blocks can consist of several subtasks.
It should contain
2. A total of 80 points can be achieved (without bonus points). The allocation of the points to the individual task blocks can be taken from the specification.
3. The solutions should be submitted in the form of .Rmd files. Use Markdown to mark the beginning and end of individual tasks and subtasks.
If the assignment of code to one of the (sub)tasks is not clearly declared, you may not get any points for it.
4. Each task should be processed in a separate .Rmd. If you do not hand in 3 separate .Rmd files, you will have to expect a point deduction up to a rating of zero points.
5. The submission takes place either via Moodle or via Github (please not both).
6. It is your responsibility to ensure that your results can be replicated locally by the reviewers. Therefore, add to your
solution all necessary files for compiling (Rmd + data + all other files necessary to generate the final output (PDF or HTML)), do not use local paths and load used packages (this List is not complete.). If your results cannot be easily replicated, you must expect a point deduction up to a rating of 0 points. In addition, a submission that gives a different (programming) error during compilation is rated with 0 points.
7. Ensure that all functions are documented as specified in the exercises and that basic input checks are carried out on all functions.
8. Note other formal evaluation criteria at the beginning of the question be explained.
9. If you have technical or other difficulties, please contact the course instructors. Email: . (Please send the emails to all listed persons!)
10. The tasks must all be worked on independently. In particular, no working groups are allowed and any other discussion of the tasks and solutions with other people (regardless of whether they study statistics or not) is not permitted.
11. The Internet can be used passively. This means that internet pages or forums may be called
up and read, but actively asking questions that are relevant to solving the tasks is not permitted. Likewise, no tasks or proposed solutions and other information may be posted on the Internet or discussed or distributed via chat, email and other communication channels.
12. Should there be any suspicion of plagiarism, fraud or other impermissible behavior, additional (oral) tests can be called to check the independent processing of the tasks.
13. Doubts about the independent processing of your submission will result in the exam not being passed and the examination board being involved.
14. Submission is by midnight (11:59 p.m.) on 03/07/2022 CET (central european time).
15. You must sign the cover sheet and submit it separately on Moodle. With your signature you confirm your independence. Submissions without a signed cover sheet cannot be considered.
Page 2 of 7
When editing, make sure that all top-level functions are well documented and that at least basic checks are performed on all function inputs.
Both aspects are included in the assessment. Make sure your outputs are legible, don’t overflow the margins (when compiling the .Rmd file to a PDF), and that graphics have good legible labels and legends. If this is not the case, there may be (bonus) point deductions.
For task blocks 2 and 3 the following applies:
• Name each of your solutions (objects) to the subtasks as in the last exercises according to the ̈
associated subtask. This means that your solution to exercise 2e should be called ex2e.
• In addition, export each of your solutions into a folder solutions (this should be created with dir.create within the Rmd and not be handed over).
• The files stored in it should have the same names as the associated R objects. That means ex2e should be saved as ex2e.Rds in the solutions folder.
• Verbal responses do not have to be exported as an object, but should be direct as continuous text in Rmd.
• Graphics should be created with ggplot2 and exported with ggsave will.
• If you deviate from these specifications, you have to deduct points up to one Expect a rating of 0 points.
You can collect the following bonus points for this homework: (a) Abgabe via Github Classroom (BONUS: 2P)
(b) Rmarkdown output well formatted (BONUS: 2P)
(c) R Code follows the Advanced R Style guide (http://adv-r.had.co.nz/Style.html) (BONUS: 1P)
exercise 2
Since the 1970s, SAP has changed the IT infrastructure of companies worldwide.
Essentially, SAP and its competitors (eg Oracle) offer ERP (Enterprise Resource Planning) systems. These accompany processes and procedures on the IT side. SAP systems are used for a large number of relatively standardized processes. One of these standardized processes is the purchasing process. During the purchasing process, clerks and buyers make entries (or change them) in different tables.
Particularly important entries are made in the following tables: • EKKO (purchasing document header)
bonus points
Page 1 of 7
• EKPO (purchasing document item), • Supplier master (LFA1)
Today we want to look at these three tables to better understand our purchasing process. However, we only consider the order and ignore previous steps (e.g. purchase requisitions) or subsequent steps (e.g. receipt of the goods or payment of the invoice). We suggest that you use this resource to become familiar with the different table and column names and what they mean. To solve this task, you can use all the tidyverse packages that we discussed in the lecture. In a subtask we point out that you can also use the patchwork and lubridate package.
(a) Look at the three tables LFA1, EKKO and EKPO from the associated Rds file (SAP.Rds) on the Moodle site and explain how the three tables are related.
(b) Check whether for every element in EKPO there is at least one element in EKKO. Also consider the opposite case, i.e. whether all elements in EKKO are contained in EKPO at least once.
(c) Repeat b) for LFA1 and its related table.
(d) According to your findings from b) and c): What problems can arise with joins between the different tables?
(e) Determine the following key figures: the number of dealers, the number of order items and the number of orders. Your solution should be a named list.
(f) The NETWR column got a bit messed up when merging tables. There are four different currencies in this column, each appearing in different spellings. Create a new column WAERS, which contains the abbreviation (three letters) of the currency (USD, JPY, EUR, GBP). Convert NETWR to a numeric column. Furthermore, we want to have the conversion rate from the currencies to the US dollar in one column (EXCHANGE). You may use the following values 1 USD = 130.0 JPY, 1 EUR = 1.2 USD, 1 GBP = 1.9 USD. Your solution to this problem is the entire cleaned dataset.
Hint: You can check your efforts so far with the intermediate solution Ex2Zmittenloesung.Rds on the Moodle page. In the following, use this data set for EKPO if you come to a different solution.
(g) Calculate the total value of orders in euros. Your solution is a scalar.
(h) Calculate the total value of orders in euros for each company code. Your solution is a data.frame (tibbles are fine too.) with two columns. Note: BUKRS.
(i) Calculate the total value of orders in euros for each seller. Your solution is a data.frame (tibbles are fine too.) with two columns.
(j) Show in a bar chart how many order items were ordered from the different countries of origin.
Page 2 of 7
(k) Calculate the average goods value of the orders broken down by the different sellers. Use the real names of the dealers in the table.
So your solution should be a table with two columns, in which one column contains all salespeople and the other column contains the average order value per order in euros. Your solution is a data.frame (tibbles are fine too.) with two columns.
(l) Calculate for each seller the number of currencies with which he sold goods during the period under study. Your solution is a data.frame (tibbles are fine too.) with two columns.
(m) Visualize the number of orders over time and the development of the net order value in a common plot. Aggregate on a monthly basis. Use the patchwork package for this and display the key figures in separate panels. Note: The ludridate package could help you here and may be used.
(n) Calculate how many orders were placed automatically. Her solution is a scalar.
As a teacher of class 7b in mathematics, you have the point list from points.xlsx after correcting
the last class test. Your task is to find a fair distribution of grades. The vector Reference.Rds provides information about the maximum number of points that can be achieved per task. You may use the readxl package for this task. Save and export your results in the same way as in Task 2.
(a) Read in the data set and the reference RDS.
(b) Create a new column in the dataset that gives the total score. Your The solution is the new data.frame (tibbles are fine too.).
(c) Visualize the distribution of points in the different tasks and the total number of points in a plot with several subplots. What do you notice about the tasks?
(d) You are trying to use the standard grading scale here: up to 90 percent 1, up to 80 percent 2, up to 70 percent 3, up to 60 percent 4, up to 40 percent 5. (After that a 6.) What is the distribution of grades in this case? Visualize this using a barplot.
Note that the ”to”’s are always inclusive and that you don’t have to change the record itself in this subtask.
(e) Some students approached you after the exam and told you that task 2 was partially misunderstood and they therefore did something completely different from what was required. Since you share the students’ assessment after their presentation, you re- evaluate the work of the students , and so that they have 2, 4 and 3 points. In addition, you decide to complete the task with a maximum of 5 instead of the originally estimated one
Page 3 of 7
Score 10 points on Task 2. Give all students who scored more than 5 points on Task 2 exactly 5 points and create a new column to store any points earned in excess of 5 points as a bonus. Your solution is a data.frame (tibbles are fine too.) with two columns. Your solution is the new data.frame (tibbles are fine too.).
(f) Now calculate the grades and save them as a sorted data.frame (surname AZ), which you can give to the secretariat. Separate the name into first and last names and remove the scores. Export this as a CSV file in the solutions folder so that it can be read in by calling readr::read csv. Your solution is the CSV file.
In this task you should deal with the sport of tennis. More precisely, it is about simulating individual points, games, sets and matches. We are looking at women’s tennis here and are assuming a standard match between two players (i.e. no doubles, etc.).
We want to simulate which player will win the match, given their respective probabilities of scoring a point on their own serve. Who wins the match then depends on the concrete counting method. Here we consider the following case:
• A game is won when the score is
– 4 : x or x : 4 with x ÿ 2
– after a tie (e.g. 3:3 or 4:4), if one player is 2 points ahead Has
• Games won are added up until one player wins the set. A set is won when it is played after
– 6 : x or x : 6, with x ÿ 4 or 7 : 5, or 5 : 7 – if there is a tie (6 :
6), the set is won/lost in the tie-break with a score of
ÿ 7 : x or x : 7 with x ÿ 5 or ÿ after a tie (6 : 6) with a lead of 2 points
• Sets won are added up. A match is won when a player has won 2 sets. Ie the match is won at the score of
– 2 : 0 or 0 : 2 or
– If there is a tie (1:1) with 2:1 or 1:2
In the following, functions are to be written that make it possible to simulate matches multiple times. What is important here is the probability of the two players scoring a point on their own serve.
Page 4 of 7
(a) Write a function that simulates a rally. This takes an argument p, the probability that the serving player scores a point.
The point goes to the opponent with probability q = 1 ÿ p. If the server makes a point, a 1 should be returned, otherwise a 0. The signature of the function is given below:
sim_point <- function(p) { # EVERYTHING
(b) Based on (a), write a function that simulates whether the serving player wins a game. where p is the probability of scoring a point on your own serve. The function should return 1 if the server wins the game, 0 otherwise.
sim_game <- function(p) { # EVERYTHING
(c) Based on (b), write a function that simulates a sentence. The signature is given below. where p 1 is the probability that the player who serves first in the set will score a point on her own serve and p 2 is the probability of the player who serves second in the set to score a point on her own serve. Note that a player always serves until a game is won or lost, after which the right to serve for a game switches to the other player. The function returns 1 if player 1 wins the game and 0 if player 2 wins the game.
sim_set <- function(p_1, p_2) { # EVERYTHING
(d) Based on (c), write a function that simulates a match. The signature is given below. The arguments of the function are the probabilities of player A or B to score a point on their own serve. Note that the right to serve for the first game in the match is drawn between players A and B (50% probability each). After that, the serve alternates between the players until the set is won by one of the players.
The right to serve also alternates between sets. For example, if player A starts serving in the first set, player B starts serving in the second set, and so on (and vice versa). The output of the function should be "A" if player A wins the match and "B" if player B wins the match.
Page 5 of 7
sim_match <- function(p_a, p_b) { # EVERYTHING
(e) Use the function from (d) to simulate 100 matches in the hypothetical duel between (pa = .652; in 2016) and (pb = .706; in 2020). Calculate the probability of winning a match against .
The probabilities of winning a game, set and win depending on pa and pb can also be derived and calculated theoretically. For example, the probability of player A winning is given as
MS2S2S pA =(pA) +2(pA)pB,
where p is the probability that player A will win a set. Ie the game can be won by player A 2:0 or 2:1,
for which there are two possibilities (1:0, 1:1, 2:1 or 0:1, 1:1, 2 :1). can in turn be calculated
S analytically from the winning probabilities for a game, etc. A
The R script probabilities-tennis-analytical.R (available on Moodle) contains functions for calculating these probabilities. The details are not important here and you don't have to follow the codes. However, you can use them to check your simulation functions from (b) - (d).
(f) Use source to get the functions from probabilities-tennis-analytical.R read into R.
If the functions from (f) have been read in correctly, the function p match df is available to you. This creates a record with two columns. The first column gives the probability (n values between 0 and 1) that player A will win her own serve. The second column the probability of player A winning the match, given a fixed probability pb that player B will win her serve. An example and visualization is given below:
match_prob_a <- p_match_df(n = 101, pb = .6) plot(match_prob_a[,1], match_prob_a[,2], type = "l", xlab =
expression(p[a]), main = expression(p[b]==0.6), ylab = "Gewinnwahrscheinlichkeit Match Spielerin A", las = 1)
abline(h = c(.5), v = .6, lty = 3) abline( h = match_prob_a[round(match_prob_a[,1],
2)==0.70, 2], v = 0.7, lty = 3)
Page 6 of 7
0.0 0.2 0.4
0.6 0.8 1.0
It can be seen that if both players have a 0.6 probability of winning their own serve, the probability of winning the match is exactly 0.5. You can also see that in tennis, the better player has a clear advantage due to the counting method and the high number of points played. For example, a 0.1 increase in the probability of winning a point on serve (from 0.6 to 0.7) results in a much larger increase in the probability of winning a match by 0.4 (from 0.5 to ÿ 0.9).
(g) Use your function sim match to build a data set as in the example code above for fixed pb=0.6 and for different values of pa (seq(0, 1, length=101)).
For each row in the data set (i.e. for each combination of pa and pb), the calculation of the probability should consist of 100 simulations of matches (as in subtask (e)). Copy the code from the graphic above and also draw the probabilities you calculated as a dashed line. Hint: Make sure that your results are completely reproducible.
Page 7 of 7
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com