程序代写代做 html graph algorithm decision tree Q1: Review Questions

Q1: Review Questions

1.1 Explain CHAID model algorithm .

1.2 Explain how the target changes sequentially in Gradient Boosting.

1.3 Explain the likelihood ratio (LR) test and the Wald test.

(my pick would be the following: clean and quick)
https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqhow-are-the-likelihood-ratio-wald-and-lagrange-multiplier-score-tests-different-andor-similar/

1.4 Based on the two variables, we try to predict the class variable.

(a) Calculate Gini Index for Var1 = 1 and for Var2 >=32

P(var1=1)=0.4, P(var1=0)=0.6;
P(class=A & var1=1)=0.25, P(class=B & var1=1)=0.75;
Gini index(var1=1)= 1-[(0.25×0.25)+(0.75×0.75)]=0.375

P(var2>=32)=0.8, P(var2<32)=0.2; P(class=A & var>=32)=0.625, P(class=B & var>=32)=0.375;
Gini index(var2>=32)=1-[(0.625×0.625)+(0.375×0.375)=0.46875

(b) Which variable will you use for splitting?

I will choose var2 for splitting.

Q2: Build “your” Model

Target = Interval Variable (Rev_Total)

A bank wants to understand how customer banking habits contribute to revenues and profitability. The bank has customer age and bank account information, e.g., whether the customer has a savings account, whether the customer has received bank loans, and other indicators of account activity.

Your task is to build a model that allows the bank to predict profitability for a given customer. A surrogate for customer profitability available in our data set is the Total Revenue a customer generates through their accounts and transactions. The resulting model will be used to forecast bank revenues and guide the bank in future marketing campaigns.

Data = Revenue.sas7bdat

The data set (in SAMPSIO2) contains information on 7,420 bank customers:
Rev_Total Total revenue generated by the customer over a 6-month period.
Bal_Tota Total of all account balances, across all accounts held by the customer.
Offer An indicator of whether the customer has received a special promotional offer in the previous one-month period. Offer=1 if the offer was received, Offer=0 if it was not.
AGE The customer’s age.
CHQ Indicator of debit card account activity. CHQ=0 is low (or zero) account activity, CHQ=1 is greater account activity.
CARD Indicator of credit card account activity. CARD=0 is low or zero account activity, CARD=1 is greater account activity.
SAV1 Indicator of primary savings account activity. SAV1=0 is low or zero account activity, SAV1=1 is greater activity.
LOAN Indicator of personal loan account activity. LOAN=0 is low or zero account activity, LOAN=1 is greater activity.
MORT Indicator of mortgage account tier. MORT=0 is lower tier and less important to the bank’s portfolio. MORT=1 is higher tier and indicates the account is more important to the bank’s portfolio.
INSUR Indicator of insurance account activity. INSUR=0 is low or zero account activity, INSUR=1 is greater activity.
PENS Indicator or retirement savings (pension) account tier. PENS=0 is lower balance and less important to bank’s portfolio. PENS=1 is higher tier and of more importance to the bank’s portfolio.
Check Indicator of checking account activity. Check=0 is low or zero account activity, Check=1 is greater activity.
CD Indicator of certificate of deposit account tier. CD=0 is lower tier and of less importance to the bank’s portfolio. CD=1 is higher tier and of more importance to the bank’s portfolio.
MM Indicator of money market account activity. MM=0 is low or zero account activity, MM=1 is greater activity.
Savings Indicator of savings accounts (other than primary) activity. Savings=0 is low or zero account activity, Savings=1 is greater activity.
AccountAge Number of years as a customer of the bank.

Data Partition = Default (40% Training / 30% Validation / 30% Test)

Model Selection Criteria = Average Squared Error and BIC (Test Data)

• Write a model building process diary. It describes everything you have done including the reason, the outcome, and your thoughts.

e.g., Stage I (Exploring Data)
Stage II (Model Development)
Model 1 (Regression Tree Model)
Model 2 (Regression Model)
:
(all variations of each model type)
Stage III (Model Comparison)
Stage IV (Final Model and Interpretation)

• Organize your output and/or SPK files according to the diary. Make sure that you order your results (numbered) according to the diary.

Original Case in https://www.jmp.com/en_us/academic/case-study-library.html#bank
It can be considered a baseline. You can do better than the case results.

Q3: Build “your” Model

Target = Binary Variable (Offer Accepted)

A local bank would like to understand the demographics and other characteristics associated with whether a customer accepts a credit card offer. Often the company sees only those who respond to an offer. To get around this, the bank designs a focused marketing study, with 18,000 current bank customers. This focused approach allows the bank to know who does and does not respond to the offer, and to use existing demographic data that is already available on each customer. The designed approach also allows the bank to control for other potentially important factors so that the offer combination isn’t confused or confounded with the demographic factors. Because of the size of the data and the possibility that there are complex relationships between the response and the studied factors, a decision tree is used to find out if there is a smaller subset of factors that may be more important and that warrant further analysis and study.

Your task is to build a model that will provide insight into why some bank customers accept credit card offers. We are primarily interested in understanding characteristics of customers who have accepted an offer.

Data = Card_Offer.sas7bdat

The data set (in SAMPSIO2) consists of information on the 18,000 current bank customers.
Customer Number A sequential number assigned to the customers.
Offer Accepted Did the customer accept (Yes) or reject (No) the offer.
Reward The type of reward program offered for the card.
Mailer Type Letter or postcard.
Income Level Low, Medium or High.
#OpenAccounts How many non-credit-card accounts are held by the customer.
Protection Does the customer have overdraft protection on their checking account(s) (Yes or No).
Credit Rating Low, Medium or High.
#CreditCards The number of credit cards held at the bank.
#Homes The number of homes owned by the customer.
Household Size Number of individuals in the family.
Own Your Home Does the customer own their home? (Yes or No).
Average Balance Average account balance (across all accounts over time).
Q1, Q2, Q3 and Q4 Balance Average balance for each quarter in the last year.

Data Partition = Default (40% Training / 30% Validation / 30% Test)

Model Selection Criteria = Classification Rate and BIC (Test Data)

• Write a model building process diary. It describes everything you have done including the reason, the outcome, and your thoughts.

e.g., Stage I (Exploring Data)
Stage II (Model Development)
Model 1 (Classification Tree Model)
Model 2 (Logistic Regression Model)
:
(all variations of each model type)
Stage III (Model Comparison)
Stage IV (Final Model and Interpretation)

• Organize your output and/or SPK files according to the diary. Make sure that you order your results (numbered) according to the diary.

Original Case in https://www.jmp.com/en_us/academic/case-study-library.html#credit
It can be considered a baseline. You can do better than the case results.