SAS代写

Final Report:

SALES OF ORTHOPEDIC EQUIPMENT

The objective of this study is to find ways to increase sales of orthopedic material from our company to hospitals in the United States.   I want each person to concentrate in a subset of 3000 hospitals chosen at random.  Or areas(2500)Find those who have high consumption of such equipment but where our sales are low. Come up with a selected group where you think our efforts will be rewarded.

The following description of the dataset includes variable names and some summaries of variable.

A file with a shell SAS program that follows the analysis steps is provided in another link.

DATASET ORTHOPEDIC
VARIABLES:

     ZIP :  US POSTAL CODE
     HID :  HOSPITAL ID
    CITY :  CITY NAME
   STATE :  STATE NAME
    BEDS :  NUMBER OF HOSPITAL BEDS
   RBEDS :  NUMBER OF REHAB BEDS
   OUT-V :  NUMBER OF OUTPATIENT VISITS
     ADM :  ADMINISTRATIVE COST(In $1000’s per year)
     SIR :  REVENUE FROM INPATIENT
  SALESY :  SALES OF REHABILITATION EQUIPMENT SINCE JAN 1
 SALES12 :  SALES OF REHAB. EQUIP. FOR THE LAST 12 MO
   HIP95 :  NUMBER OF HIP OPERATIONS FOR 1995
  KNEE95 :  NUMBER OF KNEE OPERATIONS FOR 1995
      TH :  TEACHING HOSPITAL?  0, 1
  TRAUMA :  DO THEY HAVE A TRAUMA UNIT?  0, 1
   REHAB :  DO THEY HAVE A REHAB UNIT?  0, 1
   HIP96 :  NUMBER HIP OPERATIONS FOR 1996
  KNEE96 :  NUMBER KNEE OPERATIONS FOR 1996
 FEMUR96 :  NUMBER FEMUR OPERATIONS FOR 1996
SUMMARIES:

       ZIP                   CITY          STATE           BEDS
Min.   :  612    Chicago      :  45    CA    : 458   Min.   :   0.0
1st Qu.:28550    Houston      :  41    TX    : 342   1st Qu.:  69.0
Median :49000    Philadelphia :  38    NY    : 241   Median : 136.0
Mean   :50600    Los Angeles  :  28    PA    : 238   Mean   : 191.2
3rd Qu.:75240    New York     :  24    FL    : 228   3rd Qu.: 262.0
Max.   :99900    Dallas       :  24    IL    : 208   Max.   :1476.0
                 (Other)      :4503   (Other):2988

     RBEDS             OUTV              ADM             SIR
Min.   :  0.000   Min.   :      0   Min.   :    0   Min.   :    0
1st Qu.:  0.000   1st Qu.:   7510   1st Qu.: 1932   1st Qu.: 1312
Median :  0.000   Median :  20880   Median : 4508   Median : 3384
Mean   :  7.244   Mean   :  47350   Mean   : 6689   Mean   : 4849
3rd Qu.:  0.000   3rd Qu.:  47700   3rd Qu.: 9402   3rd Qu.: 6832
Max.   :850.000   Max.   :1987000   Max.   :66440   Max.   :70300
 

     SALESY           SALES12            HIP95             KNEE95
Min.   :   0.00   Min.   :   0.00   Min.   :   0.00   Min.   :  0.00
1st Qu.:   0.00  1st Qu.:   0.00   1st Qu.:   7.00   1st Qu.:  1.00
Median :   1.00   Median :   2.00   Median :  28.00   Median : 18.00
Mean   :  25.91   Mean   :  41.05   Mean   :  51.27   Mean   : 41.73
3rd Qu.:  23.00   3rd Qu.:  33.00   3rd Qu.:  70.00   3rd Qu.: 52.50
Max.   :1209.00   Max.   :2770.00   Max.   :1421.00   Max.   :868.00
 

       T-H             TRAUMA           REHAB            HIP96
Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :   0.0
1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:   8.0
Median :0.0000   Median :0.0000   Median :0.0000   Median :  29.0
Mean   :0.2737   Mean   :0.1225   Mean   :0.1839   Mean   :  52.6
3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:  71.0
Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1373.0
 

     KNEE96           FEMUR96
Min.   :   0.00   Min.   :  0.00
1st Qu.:   0.00   1st Qu.: 11.00
Median :  18.00   Median : 34.00
Mean   :  41.91   Mean   : 49.39
3rd Qu.:  56.00   3rd Qu.: 74.00
Max.   :1081.00   Max.   :489.00

Overview of the Analysis

Part 1. Select your market segment-s.

1.    Select cases for your dataset:

 Select a group of states for the study (It can be all of them, but it is enough to start with about 3000 hospitals). Set the zero values on SALES to missing values.

Response:            SALES = SALES12 +SALESY, IF SALES=0 => SALES=NA

2.  Transformations:

Look at each individual variables and decide “if and which” transformation is appropriate.

3.  Dimension reduction.

i)  Separate the variables into the following groups:

Response:             SALES

Demographics:      BEDS, RBEDS, OUTV, ADM, SIR, TH, TRAUMA, REHAB  

Operation numbers:  HIP95, KNEE95, HIP96, KNEE96, FEMUR96

ii) Use the factor method to summarize the demographic variables and the operation variables and come out with a final reduced list of factor variables (perhaps 3 or 4). Use the rotated factors in order to find a good interpretation of the factors and try to make a good story.

4.   Market segmentations.  

i) Independent variables are used to divide the list of hospitals (all possible clients = the market) into subsets which we call market segments.
Use cluster analysis to find the market segments or clusters. Since we are summarizing the variables with factors then use the factors.  

iii) Once the clusters are chosen we must study the summary statistics for each cluster and try to describe their content. Interpretation is very important at this stage.

v)              Finally we select the cluster or clusters that agree with our objectives. In this study you are looking for segments with over all high sales but where there are hospitals were the company’s sales are low. Some segments will have mostly low numbers for sales. This means that those hospitals have few patients who would need our products so we are not interested in them.

Part 2. Estimating potential gain in sales. Potential gain in sales is the difference between current sales and the average of sales to similar hospitals. If you are analyzing a very small cluster (N <20)  then we might assume that the sales are homogeneous and the “average sales to similar hospitals” is just the average sale to that cluster. But if the cluster is larger we will need to obtain a regression estimate. This is the procedure:

i) Do a regression for each of the t selected segments. Notice that since the segments are very homogeneous you may expect that the R-square may not be very high SO DO NOT BE CONCERNED WITH LOW R-SQUARES. 

ii) The hospitals with large negative residuals are the ones that have low sales but their characteristics suggest that they are below their potential sales (use predicted values as potential sales). Make a list of the hospitals in your segment were sales can be improved. 

iii)  Give your estimate of the potential gains.

Part 3: All these parts are required to be performed using SAS. In addition you could compare the results from SAS with alternative robust analysis using R. The R analysis would apply the methods for robust clustering (pam) and for classification and regression trees (rpart).

PAM: compare the clusters given by PAM with those from SAS, are they similar?

RPART: The idea here is to take the sales variable and make it into a categorical   1:0-median  2:median-80%   3:80%-100%. Run the tree method and select one good node that have very high sales and find hospitals on that group that have SALES=NA and estimate a potential sale gain.

  1. Transformation data variable square root log 1+cx (if data is crazy, can’t find cluster)
  2. Dimension reduction factor analysis component analysis how many factors can I get. Some factors that can make Interpretation (size of the hospital, number of bed)
  3. Cluster also has some interpretation (big hospital with few beds)
  4. Compare the sales of cluster define cluster(some have high sales)(maybe some from potential customers or some you don’t know about)
  5. Estimate the potential gain or potential sales (ex: 10 cluster, mean 3 standard error 1. Calculate y hat=X bar square +SE square
  6. Cluster estimate make table