程序代写 R functions for normal random variables

R functions for normal random variables
R has several useful functions for working with normal distributions. We will learn about them through an example involving the diameter of pine trees.
Plotting a probability density function
Our first goal is to draw a plot of the probability density function (PDF) of the random variable X, the diameter of young pine trees growing in a commercial tree farm. We know from a previous study that the mean diameter is 3.5 cm and the variance of diameter is 0.5 cm2. We will assume that X follows a normal distribution.

The R function that gives probability density for a normal random variable is called dnorm. It takes three essential arguments: 1) x is the value of the variable X whose density we want.
2) mean is the mean of X.
3) sd is the standard deviation of X.
Let¡¯s start by getting the probability density for a diameter of 4 cm.
# Input the mean, variance, and standard deviation of X
mu.farm <- 3.5 var.farm <- 0.5 sd.farm <- sqrt(var.farm) # Get the density for X = 4 dnorm(x, mu.farm, sd.farm) ## [1] 0.4393913 4 cm is close to the mean. What are the densities for values far from the mean? Let¡¯s check for values 3 standard deviations above and below the mean. # Probability density at a value 3 standard deviations below the mean x <- mu.farm - 3*sd.farm dnorm(x, mu.farm, sd.farm) ## [1] 0.00626758 # Probability density at a value 3 standard deviations above the mean x <- mu.farm + 3*sd.farm dnorm(x, mu.farm, sd.farm) ## [1] 0.00626758 These densities are much lower than near the mean. This makes sense, because the normal PDF falls sharply as we move away from the mean. We know from the two-standard-deviation rule that 95% of the probability lies within two standard deviations of the mean, so we expect very low densities when we are even farther from the mean. Let¡¯s now calculate the probability density simultaneously for several values and store them in a vector. # Set sequence of X values. x <- seq(mu.farm - 3*sd.farm, mu.farm + 3*sd.farm) # Calculate probability densities. pdf.farm <- dnorm(x, mu.farm, sd.farm) If we display the vector we see that it has several values. ## [1] 0.00626758 0.16045781 0.55594630 0.26068490 0.01654284 Let¡¯s plot the density function. # Plot the PDF. plot(pdf.farm ~ x, type = "l" # Setting type to "l" gives a line plot instead of the default point plot. This does not look like a normal probability distribution! What went wrong? The problem is that the seq function defaults to an interval of 1 between items in the sequence. That is too coarse for this variable, because its values are so small. We can fix this by using the by argument of seq. We will also give the plot better axis labels. # Set sequence of X values. x <- seq(mu.farm - 3*sd.farm, mu.farm + 3*sd.farm, by = 0.01) # Calculate probability densities. pdf.farm <- dnorm(x, mu.farm, sd.farm) # Plot the PDF. plot(pdf.farm ~ x, type = "l", main = "PDF of farm tree diameter", xlab = "Tree diameter (cm)", ylab = "Probability density" Plotting a cumulative distribution function Now let¡¯s plot the cumulative distribution function (CDF) of X. The R function for the cumulative probability of a normal random variable is called pnorm. It takes three essential arguments: 1) q is the value of the variable X whose cumulative probability we want. 2) mean is the mean of X. 3) sd is the standard deviation of X. We can plot the cumulative probability for the same range of values of X that we used for the PDF. # Set sequence of X values. x <- seq(mu.farm - 3*sd.farm, mu.farm + 3*sd.farm, by = 0.01) # Calculate the cumulative probabilities cdf.farm <- pnorm(x, mu.farm, sd.farm) # Plot the CDF. plot(cdf.farm ~ x, type = "l", main = "CDF of farm tree diameter", xlab = "Tree diameter (cm)", ylab = "Cumulative probability" Plotting two distributions on the same axes Now we want to compare the probability distribution of pine tree diameters on the farm to that of pine trees in a nearby natural forest. A previous study showed that the mean for the forest is the same as for the farm (3.5 cm), but the variance is higher (1.0 cm2). Now we calculate the PDF of the forest trees. # Enter the mean, variance, and sd for the forest trees. mu.forest <- 3.5 var.forest <- 1 sd.forest <- sqrt(var.forest) # Calculate probability densities. pdf.forest <- dnorm(x, mu.forest, sd.forest) Let¡¯s make a single plot that shows both PDFs. It will be easier to first draw the plot axes and then add each PDF using the lines function # First, use the range command to get the highest and lowest values for both the x and y axes. xrange<-range(x) yrange<-range(pdf.farm, pdf.forest) # Now draw the plot axes, without any plot. Do this by specifying plot type "n", for "none". main="PDFs for farm and forest trees", xlab="Tree diameter (cm)", ylab="Density" # Finally, use lines commands to add the PDFs. lines(pdf.farm~x) lines(pdf.forest~x) Changing line color and type. The two PDFs are a bit hard to tell apart. We can make it easier by using the col and lty arguments to give each one a distinct color and line type. # Draw the axes. main="PDFs for farm and forest trees", xlab="Tree diameter (cm)", ylab="Density" # Add the PDFs. lines(pdf.farm~x, col = "red", lty = 1) lines(pdf.forest~x, col = "blue", lty = 2) Adding a legend A legend will tell readers what the two lines mean. Use the legend function to add one to a just-created plot. # Draw the axes. main="PDFs for farm and forest trees", xlab="Tree diameter (cm)", ylab="Density" # Add the PDFs. lines(pdf.farm~x, col = "red", lty = 1) lines(pdf.forest~x, col = "blue", lty = 2) # Add the legend. legend ("topleft", legend = c("Farm trees", "Forest trees"), col = c("red", "blue"), lty = c(1, 2) Using pnorm to calculate probabilities Let¡¯s return to thinking about the trees from the tree farm. What is the probability that a randomly selected tree is less than 2 cm in diameter? This is a cumulative probability, so you can use pnorm to find out. Note that you must give pnorm the mean and standard deviation to get the right answer. pnorm(2, mu.farm, sd.farm) ## [1] 0.01694743 What is the probability that a randomly selected tree is greater than 2.5 cm in diameter? To get a probability of being greater than a given value, use pnorm to get the probability of being less than the value and then subtract this from one. 1 - pnorm(2.5, mu.farm, sd.farm) ## [1] 0.9213504 What is the probability that a randomly selected tree lies between 1 and 3 cm in diameter? To get the probability of falling within a range, subtract the cumulative probability for the lower value from the cumulative probability for the higher value. pnorm(3, mu.farm, sd.farm) - pnorm(1, mu.farm, sd.farm) ## [1] 0.2395466 Using qnorm to calculate quantiles The function qnorm gives quantiles of a normal distribution. For example, what is the tree diameter such that 75% of trees are below that qnorm(0.75, mu.farm, sd.farm) ## [1] 3.976936 That is, a randomly selected tree has a 75% chance of being less than 3.98 cm in diameter. Note the complementary relationship between qnorm and pnorm. pnorm(3.976936, mu.farm, sd.farm) ## [1] 0.7499999 The cumulative probability for X = 3.976936 is 0.75, just as the 0.75 quantile for X is 3.976936. This complementary relationship between pnorm and qnorm is summarized in this plot of the cumulative distribution function. Essentially, you can read the plot one way to get cumulative probabilities from values of X, or you can read it the other way to get values of X corresponding to particular cumulative probabilities (i.e., quantiles). The same relationship is shown on this plot of the probability density function (PDF). Here, cumulative probability is represented by a shaded area under the curve. The function dnorm gives probability density (i.e., the height of the PDF at a particular value of X). The function pnorm gives cumulative probability (i.e., the height of the CDF at a particular value of X, or the area under the PDF to the left of a particular value of X). The function qnorm gives a quantile (i.e., the value of X corresponding to a particular cumulative probability). Working with the standard normal distribution What happens if we run dnorm, pnorm, or qnorm without giving the mean and standard deviation? pnorm(-2) ## [1] 0.02275013 1-pnorm(2) ## [1] 0.02275013 It runs just fine. R assumes that you want a value of the standard normal distribution Z; that is, a normal random variable with a mean of zero and a standard deviation of one. You can see here that about 2.5% of the distribution of Z is less than -2 and the same amount is greater than 2. In other words, about 95% of the normal distribution lies within 2 standard deviations of the mean. This is the origin of the two-standard-deviation rule. Likewise, we can use qnorm to get specific values of Z associated with particular probabilities. qnorm(0.05) ## [1] -1.644854 The output of qnorm in this case is measured in units of standard deviations. This tells us that 5% of the normal probability distribution is 1.64 standard deviations or further below the mean. By symmetry, a similar proportion is 1.64 standard deviations or further above the mean. These conclusions apply to any normal random variable.

Related Posts