CS计算机代考程序代写 1

1

Statistical Methods

Computer Lab 1: Data summaries, plots and functions in R

1. The data in this task refer to the amount of claims (in £1000) arising in a general insurance portfolio. You can
find the data in the file claimamount^data.txt on Vision under Learning Materials — Labs.

Read data into R

Save the file on your computer and then read the data into R:

claim.amounts = scan(file=HH:/claim_amounts_data.txtHJskip=2)

Remember that you need to tell R where to look for the file, so you must specify the full path of the file
(here H:/claim_amounts_data.txt). Note the following:

• If you have saved the file in a different folder on your computer you need to make sure that you know
where it is (e.g. on Windows based systems, if the full path is not displayed on the toolbar, use
right-click— properties on the file).

• Also, in some cases the use of forward or backward slash for specifying the path is system-specific. If
the file cannot be found, try file=”H: \claim_amounts_data.txt” in the command above.

If you want to keep all your files in the same folder (e.g. for a particular project), it will be convenient to run
R in the same folder so that you don’t need to repeatedly specify the folder. You can change the working
folder in R at File — Change dir.

Notice also the skip=2 option in the scan() command above. Look at the data file and check ?scan to see what
this option does.

If you are used to working with spreadsheets, R has a convenient data editor: try data.entry(claim.amounts)
or de(claim.amounts).

Numerical summaries

For summary statistics, check how the commands summary。,mean(), median。, quantile(), var(), sd()
etc. work. For example, the 3rd quartile of the claim amounts data can be obtained using

quantile(claim.amountsJprob=0.75)

Now, put the data in ascending order (use sort()) and try to determine which of the two definitions of the
quartiles that we have given in the lectures R uses.

Let’s now compute the coefficient of symmetry角=”払―/. R doesn’t have a function to do this automatically,
but it is straightforward to write our own code. First, let’s look at how arrays work in R. The data
claim.amounts are stored as a onedimensional array (vector) in R, with the individual values (elements)
being indexed from 1 to 200. For example, you can access (and see) the value of the 5th amount

2

using claim.amounts[5], or the last 10 values using claim.amounts[191:200]. Note that the indexing
corresponds (by default) to the order by which we entered the data. Now, for computing 京 we can exploit
R’s array operations. Try:

sum.dev3 = sum( (claim.amounts – mean(claim.amounts))”3 )
betal = (sum.dev3/length(claim.amounts))/(sd(claim.amounts))”3 cat(“This gives:”, “\nn, “beta_1 beta1, “\n”)

Note that R is very good for working with arrays of data – see how we computed the value of】i(xi — x)3 in the
first line of the code above. The second line then computes (xi — x)3/n}/s3. Finally, the third line prints the
answer on the screen; note that “\n” starts a new line for printing.

Let’s now compute the sum of the cubed deviations for歯 with the use of a for loop:

new.sum.dev3 = 0 # initialise variable to hold the sum
dev3 = rep(0,length(claim.amounts)) # initialise array of cubed deviations
for (i in 1:length(claim.amounts)){ # start loop

dev3[i] = (claim.amounts[i] – mean(claim.amounts)广 3 # compute sum
new.sum.dev3 = new.sum.dev3 + dev3[i] # iteratively

}
print(new.sum.dev3)

Again, make sure you understand how this works!

Check if the answer is the same as with sum.dev3 above. Using loops will most times be slower than using array
operations, but it often provides clearer understanding – and in some cases it might be the easier option for
coding. Notice in the code above, that it is important to make sure that the variable new.sum.dev3 is equal to
zero before the loop starts (line 1 of the code), and that dev3 has to be ‘defined’ as an array outside the loop (line
2 of code).

Graphical summaries

For producing various summary plots, see the use of commands hist(), stripchart(), boxplot() etc. on the tutorial
sheets. Here, as an example, we consider how to show two plots on the same graph or scale (e.g. when we want
to compare two data sets). Let’s assume that the claim amounts data can be divided to two different groups (e.g.
for gender of policyholder) and we want separate boxplots per group on the same graph. We will first ‘assign’
each claim to group ‘1′ or ‘2′ (here completely at random by alternating group membership):

group.ind = rep(c(1:2),100); group.ind

group.1 = claim.amounts[group.ind == 1]
group.2 = claim.amounts[group.ind == 2]

# create index by
# repeating (1,2) 100 times
# choose amounts with index = 1
# same for index = 2

3

We are now ready to draw the boxplots:

boxplot(group.1Jgroup.2Jhorizontal=TJcol=HorangenJnames=c(nGroup 1″,HGroup 2″)) title(main=HBoxplots of
claim amounts per group”,xlab=”Amount (1000s pounds)”)

Note that an alternative way to do this would be:

boxplot(claim.amounts ~ group.ind,horizontal=TJcol=”orange”J names=c(“Group 1″,”Group 2″))

but the first command is more useful when we have two separate data sets.

We can also draw histograms for the two groups side-by-side. For comparison purposes we need to ensure that
the two plots are drawn on the same scale (for the x and y axes). Try the following: par(mfrow=c(1,2)) #
Specifies layout for plotting window amount.breaks = seq(0.9,2.3Jby=0.1)
hist(group.1, breaks=amount.breaks, right=FALSE, col=”gray”, ylim=c(0,25),
main=”Claim amounts: Group 1″, xlab=”Amount hist(group.2,
breaks=amount.breaks, right=FALSE, col=”gray”, ylim=c(0,25), main=”Claim
amounts: Group 2″, xlab=”Amount

Notice how we have determined the cells for the histograms (both range and whether they are right-closed), and
controlled the range of the axes.

It is important to know how you can export your graphs for use in other programs (e.g. when you are preparing
reports). It is straightforward to copy and paste the graph on an editor – e.g. right-click on the R graphics window
and choose Copy as.However, it is often useful to save the graph for later use. To save your graph as a PDF file,
make the graphics window active (by clicking on it) and then choose File — Save as. Try to save your boxplot
graph and then open it on your computer (make sure you remember where you saved the file!). Later you may
also want to check the postscript() command which gives you more control over the saving options – e.g. size of
the graph.

2. R is a programming environment and you can use it to write functions that can perform complex mathematical
and statistical operations. Functions are a very useful and portable tool, especially for computations that you use
repeatedly or when you want to write a longer piece of R code that can be broken down to smaller ‘tasks’. Here
we will consider a simple task – writing a function for computing the symmetry coefficient Si which was also
computed before without a user defined function.

The code for defining the function is given below. As with the for(…) command in task 1 (which is a
system-defined function itself), note that the body of the function (included in curly brackets) must be entered in
R as a block of commands.

betal.fn <- function(x){ # name and input of function sum.dev3 = sum( (x - mean(x)广 3 ) # see Lab 1 ... bl = (sum.dev3/length(x))/(sd(x))"3 cat("\n”,"Coefficient of symmetry for these data is: ", "beta_1 =", bl, "\n\n") } Note that the input to this function is the array x, which is the data to be used. Also recall the use of the cat(...) xlim=c(0.9,2.3), (1000)") xlim=c(0.9,2.3), (1000)") 4 command from before (use ?cat for help). Load the claim.amounts data on your R session again (use the scan(...) command as before if required), and call the betal.fn function using the claim amounts as its argument: betal.fn(claim.amounts) Check that you get the same answer as in Task 1. Now change your data, for example by using claim.amounts.new =0.6+ claim.amounts or claim.amounts.new = 1.3 * claim.amounts and call the betal.fn function again to compute the new value of角. Comment on how the symmetry coefficient has changed with the new data.