MTH 491B – Project 4
Setup
For the fourth project, this will be somewhat similar to Project 3 but on the final data set.
First, get the data set from the “Data” tab on D2L. Put this in some folder that you can have permanent
access to (for example, on the P drive – not something that is in the work folder or a temporary folder).
Once you have placed this in a location of your choosing, WRITE PROTECT THE FILE. Right click on the
file, select “properties” at the bottom, then click “Read-only.” Once you start coding in SAS,
immediately make a data step that just duplicates this file.
data libname.Copy; set libname.Master491BFinal;
run;
Or something to that effect. This ensures you won’t change the final data set and if you mess up, you
can always start over from this initial statement on your direct copy and recover the master data set if
you have to. Yes, you can always redownload it from D2L, but 1) this is much more of a hassle than
making a copy of the data set, and 2) sometimes you can’t just redownload a file. (I deleted data from
my employer one time and… yeah… I had to call tech support to reset some servers…).
Variables and Definitions
Policy – A number representing a group of people (which must be a family of people in this
case). Assume that entries of the same policy number are
husband/wife/offspring/aunts/uncles/etc. or single (in the case of one person on the policy).
Gender – This variable is either “M” or “F.”
Age – How old someone is; ranges from 18 to 90.
Loyalty – This is the number of years a person has been buying insurance from you / your
company, 0 meaning they are a new customer. Family members on the same policy receive the
maximum loyalty discount for that policy. For example, there are three people on the policy
with Loyalty 0, 2, 3. Everyone on the policy should receive a discount according to Loyalty 3.
However, the age of the person will restrict this. An 18 year old should not have a loyalty of,
say, 3 because they cannot have been with the company for 3 years, despite the family plan. An
18 year old should have a maximum loyalty of 0, a 19 year old has a maximum of 1, and so on.
Smoker – Whether or not the person smokes cigarettes.
ZIP Code – Where the person lives.
Accidents – This is the number of car accidents a person has been involved in.
Annual Premium – This is the calculated premium according to the model in Project 2.
o Annual Premium = Base amount * Smoker Discount * Accident Charge *
* Loyalty Discount ∗ 𝑒20∗𝐴𝑐𝑐𝑖𝑑𝑒𝑛𝑡 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 – 200
o The base amount is equal to 1000.
o We will use the “Death Probability” cells in the Parameters excel spreadsheet to
represent the “Accident Probability.”
Import Code
To import the parameters from the excel file, run this piece of code. (I’ve written it for you because it
would be a pain to figure out all of the parameters and specifics on your own).
proc import
out = table_name
datafile = “drive:\folder\folder\…\folder\Excel_file_name”
DBMS = XLSX REPLACE;
RANGE = “range_name”;
GETNAMES = YES;
run;
In excel, you have sheet names and range names. A range can be created by highlighting an array of
cells and then, in the upper left hand corner where the cell range is listed, typing in a name for the
range.
You will have to run this code for each variable (Mortality, Loyalty, Accident, Smoker). This would be a
good opportunity to learn how to use a macro in SAS, but if this takes you too long to figure out how
macros work, just copy and paste the code as you need to.
Questions
Here is what you will be answering:
1) There are some erroneous observations.
a. What policy numbers have the erroneous entries?
b. Can you fix all of them? If so, state how and what you have done to “fix” it. If not,
state why not and how you will deal with it.
i. Note – this is somewhat of an open question. For instance, how do you
know which part of the data is correct or incorrect? Or do you know which
part is correct or incorrect?
ii. If there were more erroneous entries, would your method of handling these
entries be different? How so?
2) Determine the distribution for:
a. Loyalty
b. Accidents
c. Smoker/Nonsmoker
3) Calculate the total premium and average premium by ZIP code.
a. What 5 ZIP codes have the highest total premium?
b. What 5 ZIP codes have the highest average premium?
4) An earthquake hits ZIP codes 10480 – 10510. Assume that the damage is a complete loss of
premium from all of the insureds in these ZIP codes.
a. How much is the total loss?
What You Will Be Turning In
Turn in a Word document with your written explanations and upload it to the D2L dropbox for Project 4.
Also, upload your code file to the D2L dropbox for Project 4.