代写代考 Data Manipula

Data Manipula
Introduction to PROC TABULATE
Wendi L. Wright, Educational Testing Service, Princeton, NJ
This introduction to PROC TABULATE first looks at the basic syntax of PROC TABULATE and then, using a series of examples, shows how to produce one, two and three dimensional tables. The paper also illustrates how to use the TABLE statement and the difference between the CLASS and VAR statements. Also discussed are adding statistics for the table (including percents), labeling variables and statistics, adding totals and subtotals, and how PROC TABULATE handles missing data. Finally several examples are shown for how to clean up the tables using both standard PROC TABULATE options as well as several style options within ODS.

Copyright By PowCoder代写 加微信 powcoder

INTRODUCTION
PROC TABULATE is a procedure used to display descriptive statistics in tabular format. It computes many statistics that are computed by other procedures, such as MEANS, FREQ, and REPORT. PROC TABULATE then displays the results of these statistics in a table format. TABULATE will produce tables in up to three dimensions and allows, within each dimension, multiple variables to be reported one after another hierarchically. PROC TABULATE has some very nice mechanisms that can be used to label and format the variables and the statistics produced.
BASIC SYNTAX
PROC TABULATE; CLASS variables < / options>; VAR variables < / options>; TABLE ,
< / options> ; … other statements … ;
Let’s take a look at the basic syntax of the PROC TABULATE Procedure. We will start with three of the statements that you can use in PROC TABULATE, CLASS, VAR, and TABLE. As you can see each of these statements, as well as the PROC TABULATE statement itself allows options to be added. For each of the statements, the options need to be preceded with a ‘/’.
Note: two differences in the syntax from any other Procedure in SAS®; one) the variables in all three statements cannot be separated by commas; and two) the commas in the table statement are treated in a special way and mean a change in dimension.
OPTIONS FOR PROC TABULATE STATEMENT
Let’s take a look at a few of the options you can specify on the PROC TABULATE statement.
Data= Out=
Format= Formchar=‘…’
Specifies what input data to use.
Specifies the name of the output dataset to store calculated values
Option specifies a format to use for each cell in the table. Best12.2 is the default.
This specifies what line characters to use when drawing the

Data Manipula
NoSeps Order=
Style= Contents= Exclusive Classdata=dset
USING FORMCHAR= OPTION
table (more on this in a minute).
This option eliminates horizontal separators in the table (only affects traditional SAS monospace output destination). Unformatted/Data/Formatted/Freq – orders how the CLASS values appear in the table
Tells SAS to treat missing values as valid
Used with ODS specifications
To specify exact combinations of data to include
Here is an example of the formchars for specifying the formchar statement. The example shows the default as well as where each character (by spacing) is used. If you want to change any of these, just change the formchar string. For example if you want to change the upper left corner to a ‘+’, go to the third position in the formchar string and change the – to a +. The default value is formchar=’|—-|+|—‘. The figure below shows how each position in the formchar string matches up to the table parts.
VAR STATEMENT
The VAR statement is used to list the variables you intend to use to create summary statistics. As such, they must be numeric. There are only two options that can be used with the VAR statement and, if present, these options appear after a ‘/’.
Style= ODS style element definitions. Example might be to change the justification or the font.
specify another variable that will weight the values of the variable with the following exceptions:
(0 or <0 = counts observation in total number of observations, blank= exclude observation entirely) Data Manipula TABLE STATEMENT The Table statement consists of up to three dimensions expressions and the table options. To identify different dimensions, just use a comma. If there are no commas, SAS assumes you are only defining the column dimension (which is required), if there is one comma, then the row dimension is first, then the column, or, if you have three commas, then the order of expressions is page, then row, then column. Options appear at the end after a ‘/’. You can have multiple table statements in one PROC TABULATE. This will generate one table for each statement. All variables listed in the table statement must also be listed in either the VAR or CLASS statements. In the table expressions, there are many statistics that can be specified. Among them are row and column percents, counts, means, and percentiles. There are about a dozen options that can be specified. Here are a few of them. Box = Condense NoContinued MissText PrintMiss Indent= RTSpace = Style=[options] Text and style for the empty box in the upper left corner. Print multiple pages to the same physical page. Suppress the continuation message If a cell is blank, this text will print instead Print CLASS variable values, even if there is not data for them (this only works if somewhere there is at least one observation with that value. Number of spaces to indent nested row headings. Number of positions to allow for the row headings. Specify ODS style elements for various parts of the table. CONSTRUCTING A TABLE STATEMENT – DIMENSION EXPRESSIONS There are many elements you can use to construct a table expression. You start, of course, with the variables you want to include in the table, but you can also specify the universal CLASS variable ALL which allows you to calculate totals. You will also need to specify what statistics you want to put in the cells of the table. To make your table ‘pretty’, you can also specify formats, labels, and ODS style specifications in the expression. So let’s take a closer look at how to construct dimension expressions. Here is where PROC TABULATE differs from all the other Procedures in the SAS programming language. The syntax used here is very different. • A comma specifies to add a new dimension. • The asterisk is used to produce a cross tab of one variable with another (within the same dimension however, different from PROC freq). • A blank is used to represent concatenation, or place this output element after the preceding one listed. • Parenthesis will group elements and associate an operator with each element in the group • Angle brackets specify a denominator definition for use in percentage calculations. SIMPLE TABLE WITH ONE DIMENSION The simplest table will have only one variable. You must specify this variable in either the CLASS or VAR statement, but the resulting table will be a little different depending on whether you specify the variable in the CLASS or VAR statement. If specified in the CLASS statement, you will get a count of observations in each category with the categories listed across the top of the page in columns. If the variable is specified in the VAR statement, then you will get a total sum across all observations. Data Manipula Here is our example using the variable as a VAR variable. We should get the total income summed across all observations. The resulting table is shown to the right of the example. ADDING STATISTICS If you want something other than the default N or sum, you can do this by using the ‘*’ and adding the name of the statistic you want instead. You can group multiple stats and variables with parentheses to get the results you want. PROC TABULATE data=one; VAR income; TABLE income; Descriptive Statistics COLPCTN PCTSUM COLPCTSUM MAX ROWPCTN MEAN ROWPCTSUM MIN STDDEV / STD N STDERR NMISS SUM PAGEPCTSUM PCTN Quantile Statistics MEDIAN | P50 P1 Q1 | P25 QRANGE Hypothesis Testing PROC TABULATE data=one; VAR income; TABLE income * (N MEAN); CLASS STATEMENT Classification variables allow you to get stats by category. You will get one column or row for each value of the CLASS variable. You will need to be careful to use a categorical variable with only a limited number of categories or you may end up producing many, many pages of output. 6 Data Manipula The syntax for the CLASS statement is similar to the VAR statement. List the variables you want to use to group data followed by a ‘/’ and any options you want. The variables here can be either numeric or character (unlike the VAR statement which required numeric). The statistics you can get for these variables are only counts and percents. The statistics will be produced for each LEVEL of the variable. This is almost like using a BY statement within the table. The options you can use for the CLASS statement are different than for the VAR statement. Here are a few of them: Ascending/Descending Missing Style=[options] PreLoadFMT groupinternal Specify the order the CLASS variables values are displayed Consider missing values valid with special missing values treated separately. Enables use of multi-level formatting with overlapping ranges (ex – by state and by region at the same time) Give ODS style element definitions to these variables This will preload a format and will also (if other options are also specified), display all values in the table even if there are no observations present with some of the values. Will exclude from the table all combinations of CLASS variables not present in the data (normally used with the preloadfmt option. Used to group values together by their internal values, not formatted. Groups levels of CLASS variables in the order specified: • Internal (default) – use actual values in data • Data – same order the data is already sorted in • Formatted – use the formatted data values • Freq – highest counts first Let’s talk about how the CLASS variables are handled if they are missing. This applies to any Procedure where you can use a CLASS statement. If an observation has a missing value on even one of the CLASS variables, that observation is excluded from ALL calculations, even if they could have been included in some of the others. For example, a student has a gender value of ‘F’, and an education value of blank. He would not be included in the gender totals. To get him included wherever possible, use the ‘missing’ option. FROM SAS Online Documentation: “By default, if an observation contains a missing value for any CLASS variable, then PROC TABULATE excludes that observation from all tables that it creates. CLASS statements apply to all TABLE statements in the PROC TABULATE step. Therefore, if you define a variable as a CLASS variable, then PROC TABULATE omits observations that have missing values for that variable from every table even if the variable does not appear in the TABLE statement for one or more tables. If you specify the MISSING option in the PROC TABULATE statement, then the Procedure considers missing values as valid levels for all CLASS variables. If you specify the MISSING option in a CLASS statement, then PROC TABULATE considers missing values as valid levels for the CLASS variable(s) that are specified in that CLASS statement.” In this example, we are adding more columns to the right of the two columns we already have from the previous example. PROC TABULATE data=one; CLASS GENDER; VAR income; TABLE income * (N Mean) INCOME * MEAN * GENDER; 6 Data Manipula MAKE A SINGLE DIMENSION TABLE VERTICAL So far we have only looked at tables that are listed by column. Sometimes, especially if you have a single CLASS variable with many categories, it would be useful to have this display vertically. To do this, use the ROW=FLOAT option on the table statement. TWO DIMENSIONAL TABLE So let’s take a look at creating two dimensional tables. All we need to do is add a comma BEFORE the column definition and then put in our row definition. Note the comma below after the statistic specifications (see arrows). PROC TABULATE data=one; CLASS ethnic; TABLE ethnic=' ' * N=' ' , / ROW=FLOAT ; PROC TABULATE data=one; CLASS gender; VAR income; TABLE income * (N Mean) , Data Manipula To swap rows and columns, you need only to switch what you put in front of the comma compared to what is after it. PROC TABULATE data=one; CLASS gender; VAR income; TABLE gender , income * (N Mean) ; CHANGING WHERE STATISTICS ARE SPECIFIED You can get very different table structures by changing where the statistic definitions are placed. The statistic definitions can be attached to either a VAR or the CLASS variable, but note that the numbers will ALWAYS be calculated using the VAR variable(s). Here we have the statistics attached to the VAR variable in the column dimension. PROC TABULATE data=one; CLASS gender; VAR income; TABLE gender, income * (N Mean Max) ; If you move the statistic specification so that it is attached to the rows, the results look very different. PROC TABULATE data=one; CLASS gender; VAR income; TABLE gender * (N Mean Max) , 6 Data Manipula TWO CLASSIFICATION VARIABLES Here is an example with two classification variables. One is specified in each dimension. PROC TABULATE data=one; CLASS gender fulltime; VAR income; TABLE gender , Income * fulltime * ( n mean) ; You can also nest classification variables. PROC TABULATE data=one; CLASS gender fulltime educ; VAR income; TABLE fulltime * gender , Income * educ * mean ; Data Manipula ADDING TOTALS AND SUBTOTALS In order to get totals in your table, you can use the ‘ALL’ keyword. You can use the keyword in multiple places. Depending on where you put the keyword, you can get different results. This is demonstrated in the following two examples. You can place the keyword on the row or the column dimensions or on both. Note in the examples below, we are not using a VAR statement and we are only requesting the count statistic (N). The first example will place a total line at the bottom of the table. To place a total line at the top of the table, just list the ALL keyword first. PROC TABULATE data=one; CLASS gender fulltime educ; TABLE fulltime * gender ALL, educ * N ; 6 Data Manipula In this example, we are adding a sub-total for total education by gender If you want to put a subtotal for gender within each education level, just change the placement of the ALL keyword. Here we are also adding a total column for the Educ group. PROC TABULATE data=one; CLASS gender fulltime educ; TABLE (fulltime ALL) * gender ALL, PROC TABULATE data=one; educ * N ; CLASS gender fulltime educ; TABLE fulltime * (gender ALL) , (educ all)* N ; Data Manipula ADDING LABELS There are two ways to add labels for your variables. The first, and the simplest, is to just add =‘label’ to the dimension expression after the variable you want to label. This way works for labeling both the variables and the statistics. The second way is to add a label statement to your code: LABEL var=‘label’. The label statement will not work for labeling the statistics. You need to use the KEYLABEL statement to label statistics: KEYLABEL stat=‘label’. Alternatively, you can also use this code to get the same table as shown above. HIDING LABELS In order to hide variable or statistic labels, you can add =‘ ‘ as a label. Note the statistics MUST be attached to the row dimension and NOT the column dimension for this to work. PROC TABULATE data=one; CLASS gender fulltime; VAR income; TABLE gender = 'Gender' ALL = 'Total', PROC TABULATE data=one; Fulltime = 'Employment Status' * income * mean = 'Mean' ; CLASS gender fulltime; VAR income; TABLE gender ALL , Fulltime * income * mean ; LABEL gender='Gender' Fulltime='Employment Status'; KEYLABEL mean='Mean' all='Total'; PROC TABULATE data=one; CLASS educ gender fulltime; VAR income; TABLE educ , Fulltime='Employment Status' * gender=' ' * mean = ' ' ; Data Manipula FILLING THE BIG WHITE BOX To fill in the big white box in the upper left, use the BOX= option. PROC TABULATE data=one; CLASS gender fulltime; VAR income; TABLE fulltime = 'Employment Status', Gender * income * mean / BOX='Mean Income' ; THREE DIMENSIONAL TABLES Three dimensional tables are easy to produce, just add another section BEFORE the row and column expressions in the table statement. PROC TABULATE will now interpret the dimension statements in this order, first page, then the row, then the columns. Three dimensional tables allow you to utilize a neat trick to fill in the box, instead of the label of the page dimension appearing above the table, you can use the BOX=_page_ option to place that label inside the big white box. Only a part of the output is included below the sample code. Data Manipula PROC TABULATE data=one; CLASS gender fulltime educ; VAR income; TABLE educ='Education', fulltime = 'Employment Status', Gender * income * mean PRODUCING CELL PERCENTS Here is how to get percentages in the cells of the table. PROC TABULATE allows you to put, in the same table, both percents and summary stats. To get percentages, you do not need to use numeric variables since the percents are just based on counts. So no VAR statement is needed unless you want to add other summary stats, you can use just the CLASS statement. There are several percents you can get and note that you can also construct your own percentages by specifying what denominator you wish to use. The use of a complex denominator will not be covered here. We will only cover the three percents that are most commonly used: PCTN - percent of total. ROWPCTN – percent across row (use row total as denominator). COLPCTN – percent across column (use column total as denominator). The first example shows the use of PCTN. PROC TABULATE data=one; / BOX=_PAGE_ ; CLASS ethnic educ; TABLE ethnic * PCTN, Data Manipula The next example shows the use of ROWPCTN. Note that the percents will add up to 100% (taking into account for rounding errors) across each row. COLPCTN works similarly only the columns will add up to 100%. PROC TABULATE data=one; CLASS ethnic educ; TABLE ethnic * ROWPCTN, HANDLING MISSING DATA ON A VAR STATEMENT VARIABLE PROC TABULATE handles missing data differently depending on whether the variable is a VAR variable or a CLASS variable. By default, VAR variable observations are not dropped from the table if they are missing. To override this behavior (in other w 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com