Q1
Given two datasets, “trainDataset.csv” and “testDataset.csv”, which are extracted and pre-processed from the original Titanic dataset. The attributes are defined as follows:
• Survived: 1 represents survived and 0 represents not;
• PassengerClass: The class of the passenger on ship;
• Sex: Indicate a passenger’s sex;
• Age: A passenger’s age group at the time of ship departure;
• SiblingSpouse: The number of Siblings/Spouses that a passenger has on the ship;
The goal is to use the given data to train decision tree models to predict whether certain passengers on the Titanic will survive or not. You are required to:
• Apply the basic Hunt’s Algorithm to train a full grown decision tree model with appropriate explanations. The selection of attributes should follow the sequence as appropriate: PassengerClass -> Age -> Sex -> SiblingSpouse. If the attribute has multiple attribute values, please use multiway split (do not use binary split).
• You are required to rebuild the full grown decision tree by applying the concept of Gini index and the greedy strategy when selecting the attributes to split the tree. If the attribute has multiple attribute values, you are required to consider multiway split (do not use binary split). Calculations and explanations should be provided to demonstrate the understanding of the relevant concepts and techniques.
• Test two decision tree models using the test dataset.
• Discuss the results and the models in the case context and in broader context.