NESUG 2008 Foundations & Fundamentals
Guido’s Guide to PROC MEANS – A Tutorial for Beginners Using the SAS® System
. Guido, University of Rochester Medical Center, Rochester, NY
PROC MEANS is a basic procedure within BASE SAS® used primarily for answering questions about quantities (How much?, What is the average?, What is the total?, etc.) It is the procedure that I use second only to PROC FREQ in both data management and basic data analysis. PROC MEANS can also be used to conduct some basic statistical analysis. This beginning tutorial will touch upon many of the practical uses of PROC MEANS and some helpful tips to expand one’s knowledge of numeric type data and give a framework to build upon and extend your knowledge of the SAS System.
Copyright By PowCoder代写 加微信 powcoder
INTRODUCTION
The first in this series, “Guido’s Guide to PROC FREQ – A Tutorial for Beginners Using the SAS® System”, dealt with answering the Question of “How Many?”. This second guide concentrates on answering the question “How much?”.
The Version 9 SAS® Procedure Manual states, “The MEANS procedure provides data summarization tools to computer descriptive statistics across all observations and within groups of observations. For example, PROC MEANS calculates descriptive statistics based on moments, estimates quantiles, which includes the median, calculates confidence limits for the mean, identifies extreme values and performs a t-test”.
The following statements are used in PROC MEANS according to the SAS® Procedure Manual:
PROC MEANS
NESUG 2008 Foundations & Fundamentals
PROC MEANS DATA=Trial;
VAR Age; RUN;
The SAS System The MEANS Procedure
Analysis Variable : AGE
100 42.5800000 12.0169745 19.0000000 70.0000000
The output above gives us 5 simple statistics. The number of subjects is represented by N (N=100). The Minimum Age of the Subjects is represented by Minimum (Min=19) and the Maximum Age of the Subjects is represented by Maximum (Max=70). The Mean Age of the Subjects is represented by Mean (Mean=42.58) and the Standard Deviation of the Mean (Std Dev = 12.0169745). So the answer to our first question about what is the average age of the 100 subjects is 42.58 years.
Now we want to know what is the mean age of the men and the mean age of the women and so we can add a CLASS statement to our program to answer this question.
PROC FREQ DATA=Trial;
CLASS Sex;
The SAS System The MEANS Procedure
F 56 56 42.0892857 12.1464949 19.0000000 69.0000000 M 44 44 43.2045455 11.9603600 19.0000000 70.0000000
Analysis Variable : AGE
NESUG 2008 Foundations & Fundamentals
The great thing about the SAS System is there is almost always two or more ways to do the same thing and so another way to calculate the mean age of men and the mean age of women is to us a BY statement instead of a CLASS statement. The only caveat is that whenever you use a BY statement, the SAS dataset must be sorted. Let’s take a look at the syntax and output.
PROC SORT DATA=Trial OUT=TrialSorted;
BY Sex; RUN;
PROC MEANS DATA=TrialSorted;
VAR Age; RUN;
The SAS System The MEANS Procedure
Analysis Variable : AGE
56 42.0892857 12.1464949 19.0000000 69.0000000
44 43.2045455 11.9603600 19.0000000 70.0000000
Analysis Variable : AGE
Now you may be asking yourself, why not just use the CLASS statement and then you won’t have to sort the data. While that is correct, there may be times when you want to use both a CLASS statement and a BY statement depending on the problem. In the next example we will use both. In this example we will use Center as our CLASS variable and use Sex as our BY variable. Then we will repeat the analysis using only the CLASS statement.
NESUG 2008
Foundations & Fundamentals
PROC MEANS DATA=TrialSorted;
CLASS Center;
VAR Age; RUN;
Analysis Variable : AGE
The SAS System The MEANS Procedure
24 24 41.3750000 12.3914855
20 20 39.6500000 10.7472151
12 12 47.5833333 13.1249820
16 16 41.5000000 13.4956783 15 15 41.0666667 10.0247313 13 13 47.7692308 11.6415481
19.0000000 69.0000000
24.0000000 63.0000000
30.0000000 64.0000000
19.0000000 70.0000000 24.0000000 58.0000000 27.0000000 65.0000000
Analysis Variable : AGE
NESUG 2008
Foundations & Fundamentals
PROC MEANS DATA=TrialSorted;
CLASS Center Sex;
VAR Age; RUN;
The SAS System The MEANS Procedure
Analysis Variable : AGE
N SEX Obs N
M 16 16 41.5000000 13.4956783 19.0000000 70.0000000
24 24 41.3750000 12.3914855 19.0000000 69.0000000
M 15 15 41.0666667 10.0247313 24.0000000 58.0000000
20 20 39.6500000 10.7472151 24.0000000 63.0000000
M 13 13 47.7692308 11.6415481 27.0000000 65.0000000
12 12 47.5833333 13.1249820 30.0000000 64.0000000
While we have concisely produced the above table without sorting the data and using the CLASS statement we could still do more to make it aesthetically pleasing to the eye. So let’s decrease the decimal places to two and format the Center and Sex variables.
NESUG 2008
Foundations & Fundamentals
PROC FORMAT;
VALUE Centerf 1=’1:Austin’
2=’2:Dallas’
3=’3:Conroe’;
VALUE $Sexf ‘F’=’F:Female’
‘M’=’M:Male’;
PROC MEANS DATA=TrialSorted MAXDEC=2;
TITLE ‘Guido’’s Guide to PROC MEANS’;
TITLE2 ‘Example 6 – CLASS, FORMAT and MAXDEC’;
CLASS Center Sex;
FORMAT Center Centerf. Sex Sexf.;
Analysis Variable : AGE
Guido’s Guide to PROC MEANS Example 6 – CLASS, FORMAT and MAXDEC
The MEANS Procedure
Female 24 24 41.38 12.39 19.00
Male 16 16 41.50 13.50 19.00
Female 20 20 39.65 10.75 24.00
Male 15 15 41.07 10.02 24.00
Female 12 12 47.58 13.12 30.00 Male 13 13 47.77 11.64 27.00
69.00 70.00 63.00 58.00 64.00 65.00
NESUG 2008 Foundations & Fundamentals
Up to this point we have been letting PROC MEANS produce the “default” statistics of N, MIN, MAX, MEAN and STD DEV. (See Appendix A for available statistics from PROC MEANS)
Suppose that we want to see the MEAN, MEDIAN and the 95% Confidence Limits of the Mean. Whenever we want anything other than the default statistics we have to explicitly ask for them.
PROC MEANS DATA=TrialSorted LCLM MEAN UCLM MEDIAN MAXDEC=2;
TITLE ‘Guido’’s Guide to PROC MEANS’;
TITLE2 ‘Example 7 – Selected Statistics for Age’;
CLASS Center Sex;
FORMAT Center Centerf. Sex Sexf.;
Guido’s Guide to PROC MEANS Example 7 – Selected Statistics for Age
Analysis Variable : AGE
Lower 95% CL for Mean
Upper 95% CL for Mean
Female 24 Male 16 Female 20 Male 15 Female 12 Male 13
The MEANS Procedure
36.14 41.38 34.31 41.50 34.62 39.65 35.52 41.07 39.24 47.58 40.73 47.77
46.61 12.39 48.69 13.50 44.68 10.75 46.62 10.02 55.92 13.12 54.80 11.64
41.50 41.00 39.50 42.00 48.00 46.00
We now have a report that transmits the data very succinctly and clearly. Let’s try to do some basic statistical analyses using PROC MEANS.
NESUG 2008 Foundations & Fundamentals
If we look at the output in Example 7, then we can see that for each center there appears to be no statistically significant difference between the mean ages of the men and women. For example, in the Austin center the mean age for women is 41.38 with LCLM equal to 36.14 and UCLM equal to 46.61. The mean age for men is 41.50 with LCLM equal to 34.31 and UCLM equal to 48.69. Generally speaking, if the mean for one group is contained with the LCLM and UCLM for the other group, there is no statically significant difference in the two groups. Repeating this observation for the Dallas center, we find that there is no statistically significant difference in the mean ages of women versus men. Finally, there is also no statistically significant difference in the mean ages of the women versus the men in the Conroe center.
Now let’s try a slightly different statistical analysis and let SAS do the testing. We can consider an example from ’s book in Chapter 4. Here is a synopsis of the problem:
Mylitech is developing a new appetite suppressing compound for use in weight reduction. A preliminary study of 35 obese patients provided data before and after 10 weeks of treatment with the new compound. Does the new treatment look at all promising? Let’s take a look at the VIEWTABLE version of the SAS Dataset – Work.Obese
NESUG 2008 Foundations & Fundamentals
Notice that some subjects have a negative wtloss (this means they lost weight after the 10 weeks of treatment with the new compound). Some subjects have a positive wtloss (this means they gained weight after the 10 weeks of treatment with the new compound). If the average wtloss is not different from 0, then we conclude that there is no statistically significant difference between the beginning weight (wtpre) and the ending weight (wtloss) which is represented by the variable wtloss. PROC MEANS will test this hypothesis (referred to as “The Null Hypothesis”).
PROC MEANS DATA=Obese N MEAN STD T PRT MAXDEC=2;
TITLE ‘Guido’’s Guide to PROC MEANS’;
TITLE2 ‘Example 8 – Paired t-Test for Weight Loss’;
VAR wtloss;
Guido’s Guide to PROC MEANS Example 8 – Paired t-Test for Weight Loss
The MEANS Procedure
35 -3.46 6.34 -3.23 0.0028
Analysis Variable : wtloss
If we examine the output from Example 8 then for the 35 subjects we find that the mean difference in weight loss is -3.46 pounds, the standard deviation is 6.34, the t-value is -3.23 and the p-value is 0.0028. If the p-value is less than 0.05 then we may reject ‘The Null Hypothesis”. The p-value is 0.0028 and so we can reject “The Null Hypothesis” and conclude that there is a statistically significant difference in weight loss of the 35 subjects between pre and post treatment weights.
There are other procedures in the SAS System that can answer this question. You could use PROC UNIVARIATE which give a plethora of output, PROC SUMMARY which gives no output (by default) and since the emergence of version 7 of the SAS System you can use PROC TTEST to do the paired t-Test analysis.
We have completed our Tutorial and now the rest is up to you. The best ways to improve your SAS skills are to practice, practice, and practice. The SAS Online Help facility and SAS manuals are excellent ways to do this. Both are available to you under the Help dropdown (Learning SAS Programming and SAS Help and Documentation).
NESUG 2008 Foundations & Fundamentals
APPENDIX A – STATISTIC KEYWORDS FOR PROC MEANS STATEMENT
DESCRIPTIVE STATISTIC KEYWORDS CLM – Two sided Confidence Limit of the Mean CSS – Corrected Sum of Squares
CV – Coefficient of Variation KURTOSIS|KURT – Kurtosis
LCLM – Lower Confidence Limit of Mean MAX – Maximum
MEAN – Average
MIN – Minimum
N – Number of non-missing values
NMISS – Number of missing values
QUANTILE STATISTIC KEYWORDS MEDIAN|P50 – Median or 50th Percentile P1 – 1st Percentile
P5 – 5th Percentile
P10 – 10th Percentile
Q1|P25 – 1st Quartile or 25th Percentile
HYPOTHESIS STATISTIC KEYWORDS
PROBT – two-tailed p-value for Student’s t statistic
RANGE – Maximum minus Minimum SKEWNESS|SKEW – Skewness STDDEV|STD – Standard Deviation STDERR – Standard Error of the Mean SUM – Sum of the
SUMWGT – Sum of the Weights
UCLM – Upper Confidence Limit of Mean USS – Uncorrected Sum of Squares
VAR – Variance
Q3|P75 – 3rd Quartile or 75th Percentile P90 – 90th Percentile
P95 – 95th Percentile
P99 – 99th Percentile
QRANGE – Interquartile Range (Q3 – Q1)
T – Student’s t statistic
NESUG 2008 Foundations & Fundamentals
CONCLUSION
PROC MEANS is a very powerful but simple and necessary procedure in SAS. This Beginning Tutorial has just scratched the surface of the functionality of PROC MEANS. The author’s hope is that these several basic examples will serve as a guide for the user to extend their knowledge of PROC MEANS and experiment with other uses for their specific data needs.
REFERENCES
SAS Institute, Inc. (2002). Base SAS® 9 Procedures Guide. Cary, NC: SAS Institute, Inc.
Guido, . (2007). “Guido’s Guide to PROC FREQ – A Tutorial for Beginners Using the SAS® System”,
Proceedings of the 20th annual North East SAS Users Group Conference, Baltimore, MD, 2007, paper #FF07.
Walker, . (2002). “Common Statistical Methods for Clinical Research with SAS® Examples”, 2nd Edition, SAS Institute: Cary, NC.
ACKNOWLEDGEMENTS
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
University of Rochester Medical Center
Department of Community and Preventive Medicine Division of Social and Behavioral Medicine
120 Corporate Woods, Suite 350
Rochester, 14623
Phone: (585) 758-7818
Fax: (585) 424-1469
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com