DATA MINING AND MACHINE LEARNING (EBUS537)
Formative Assignment
Set by Prof Dongping SONG
Date of issue:
23rd Oct 2021.
Date of submission: 19th November 2021 before 12 noon (online)
Contribution:
0%.
Essay length:
1000 words (maximum).
Coursework:
Using the given table as the training dataset, apply the Greedy strategy combined with the Gini impurity measure to build a fully-grown decision tree. If the attribute has multiple attribute values, please use multiway split (do not use binary split). Leaf nodes should be declared as a single class label.
Please provide the samples of the calculations and explanations to demonstrate the application process of the Greedy strategy and Gini impurity measure.
Please perform the following post-pruning activities: (i) prune the sub-tree if all of its leaf nodes have the same class label; (ii) prune the leaf nodes that have fewer than 2 instances as appropriate.
Table 1. Data set
Gender
Car Type
Shirt Size
Class
M
Family
Small
C0
M
Sports
Medium
C0
M
Sports
Medium
C0
M
Sports
Large
C0
M
Sports
Extra Large
C0
M
Sports
Extra Large
C0
F
Sports
Small
C0
F
Sports
Small
C0
F
Sports
Medium
C0
F
Luxury
Large
C0
M
Family
Large
C1
M
Family
Extra Large
C1
M
Family
Medium
C1
M
Luxury
Extra Large
C1
F
Luxury
Small
C1
F
Luxury
Small
C1
F
Luxury
Medium
C1
F
Luxury
Medium
C1
F
Luxury
Medium
C1
F
Luxury
Large
C1