HW1
For each of the following meetings, explain which phase in the CRISP-DM process is represented:
a. Managers want to know by next week whether deployment will take place. Therefore, analysts meet to discuss how useful and accurate their model is.
b. The data mining project manager meets with the data warehousing manager to discuss how the data will be collected.
c. The data mining consultant meets with the vice president for marketing, who says that he would like to move forward with customer relationship management.
d. The data mining project manager meets with the production line supervisor to discuss the implementation of changes and improvements.
e. The analysts meet to discuss whether the neural network or decision tree models should be applied.
HW2
Make up a data set consisting of eight scores on an exam in which one of the scores is an outlier.
a.) Find the mean score and the median score, with and without the outlier.
b.) State which measure, the mean or the median, the presence of the outlier affects more, and why.
c.) Verify that the outlier is indeed an outlier, using the IQR method.
HW3.
Occupation
Gender
Age
Salary
Service
Female
45
$48,000
Male
25
$25,000
Male
33
$35,000
Management
Male
25
$45,000
Female
35
$65,000
Male
26
$45,000
Female
45
$70,000
Sales
Female
40
$50,000
Male
30
$40,000
Staff
Female
50
$40,000
Male
25
$25,000
Consider the data in above Table. The target variable is salary. Start by discretizing salary as follows:
Less than $35,000 Level 1
$35,000 to less than $45,000 Level 2
$45,000 to less than $55,000 Level 3
Above $55,000 Level 4
1. Construct a classification and regression tree to classify salary based on the other variables. Do as much as you can by hand, before turning to the software.
2. Construct a C4.5 decision tree to classify salary based on the other variables. Do as much as you can by hand, before turning to the software.
3. Compare the two decision trees and discuss the benefits and drawbacks of each.
4. Generate the full set of decision rules for the CART decision tree.
5. Generate the full set of decision rules for the C4.5 decision tree.
6. Compare the two sets of decision rules and discuss the benefits and drawbacks of each.