4th Assignment COMPX52322A v3
Gomes, May 2022
1 Overall Description
In this assignment, you will implement an ensemble classifier for data streamsI n general, this assignment has two parts, each of them graded as follows:
Copyright By PowCoder代写 加微信 powcoder
1. Coding 20 of the score
2. Experimentation and Analysis 10 of the score
Each of these parts are detailed in the following sections. Important infor mation:
Deadline: See Moodle
This assignment can be developed in groups of 1 or 2 students.
Parts of the assignments are mandatory for groups of 2 students and op tional for single student groups. These are marked as Groups of 2 Re quired.
You must code your assignment using either MOA or river.
YouneedtobuildaclassifierthatiscompatiblewitheitherMOAorriver. In MOA the classifier class will extend the AbstractClassifier abstract class, and implements the MultiClassClassifier and CapabilitiesHan dler interfaces.
Your submission must contain a zip file with 3 files:
1. a file containing the classifier implementation a single .java or .py;
2. a pdf report with the experiments and analysis; and
3. a README file with details about how to run your code and repro duce the experiments. Optionally, the README file can be replaced by a jupyter notebook with the execution of the experiments. Notice that a jupyter notebook doesnt replace the PDF report.
At the start of the window… CREATE candidate c
Window with length l During the window…
c is not used for testing pc is updated
At the end, either… REPLACE t by c IGNORE c
Figure 1: The candidate c updates in a given window
Throughout this assignment you will find Optional questions and re quests. These do not give you extra credit nor they influence your final score. They are challenges that may go beyond what was taught in the course, or require further programming and analysis efforts. In summary, you are en couraged to try them, but they are not required.
You must create a new classifier, but you are encouraged to look at the exist ing ensemble classifiers to observe how ensemble members are usually stored, hyperparameters conventions, and other common code. Notice that your en semble classifier must be implemented in either MOA or river in a way that it can be executed as a standalone classifier.
The ensemble must contain the implementations to satisfy the requirements described in the following subsections.
2.1 Ensemble Implementation
The ensemble will have a fixed number of learners S. Updates to the ensemble i.e. additions and removals will occur every l instances, such that these can be identified as windows of length l instances. At the start of each window, a new base model should be added as a candidate c to join the ensemble. At the end of each window, the ensemble will either: REPLACE another learner e currently in the ensemble by c; or IGNORE c and discard it. This process is depicted on Figure 1.
The base learner for the ensemble will be the Hoeffding Tree algorithm. Optional: You are encouraged to think about how your ensemble could be imple mented using any base learner, but that is not mandatory. The majority of the en semble algorithms available in MOAriver do not enforce the use of a specific base learner. However, it would be not easy to accomplish that in this assign ment as the diversity induction strategy assumes that we are using a Hoeffding
The training and testing methods depend on how diversity is induced see
2.1.4 and the voting method see 2.1.2. Notice that even though the ensemble is updated in windows of length l, the Hoeffding tree models are continuously updated, i.e. they should not be reset after processing every window and they should be updated during the window as well.
You can name the ensemble algorithm, but if you cannot find a good name, just call it TheEnsemble we late refer to it using this name.
2.1.1 Predictive performance pe
The estimated predictive performance pe of each member e of the ensemble E must be stored and continuously updated. Concretely, immediately before using an instance xt for training an ensemble member e, xt should be used for assessing whether e would correctly predict it, i.e. hext yt. Notice that if the instance xt is first used for training e, then the estimated predictive perfor mance would be misleading. By default, pe corresponds to the accuracy of e, i.e. number of correct predictions divided by total number of predictions made by e. Optional: Encapsulate pe such that other metrics, such as Kappa T, could be used.
2.1.2 Prediction Weighted votes
The ensemble prediction should be determined by a weighted majority vote strategy. The weights of each member e are determined according to the pre dictive performance pe.
During prediction, the algorithm should sum the weighted votes assigned to each class label to decide the ensemble prediction. For example, given an ensemble with four models, such that pg 0.90, pm 0.85, pn 0.81, and pq 0.92, where the predictions for a given x were hgx 0, hmx 0, hnx 0, and hqx 1. The weighted votes for class 0 w0 0.90 0.85 0.81 2.56 and for class 1 w1 0.92, as a consequence, the ensemble prediction is class 0.
2.1.3 Dynamic updates
After processing l instances, the algorithm should be updated by replacing the worst e min pe by the new model c. If pc pe, i.e. the predictive performance of c is worst than the minimum predictive performance observed in the ensemble pe, then c should be discarded i.e. IGNORED.
2.1.4 Training Inducing diversity
There are multiple ways of inducing diversity into an ensemble model 3. One can vary in which instances or features the models will be trained, modify the outputs of the training instances, or even combine several strategies. In this
assignment, you are asked to implement a strategy that vary the hyperparam eters of the base Hoeffding Trees. Concretely, implement a method that create new candidate trees c with random values for the following hyperparameters.
Grace period gp. The number of instances a leaf should observe between split attempts. Range of values 10; 200. Step 10.
Split Confidence sc. The allowable error in split decision. Range of val ues 0.0; 1.0. Step 0.05.
Tie Threshold t. Threshold below which a split will be forced to break ties. Range of values 0.0; 1.0. Step 0.05.
For example, a given c may be initialized with gp 110, sc 0.05 and t 0.55. There is not much difference between a tree trained using sc 0.046 and sc 0.05, thus we define a step of 0.05 for sc and t, and a step of 10 for gp.
2.1.5 Groups of 2 Required Measuring diversity
To verify if a diversity induction strategy was successful, one approach is to estimate how diverse the base models are w.r.t their predictions. Intuitively, if model a always predict the same labels as model b, then they are homogeneous. To estimate how diverse two models on cases other than the extremes always agree, always disagree, several approaches were defined, such as the Yules Q statistic or the Kappa statistic.
In this assignment, you are asked to calculate the Kappa statistic ka,b for the predictions of each pair of models u and v in the ensemble. Notice that you will need to keep counters for how every pair of models predicted. Adapted from 3: This can be achieved by constructing a contingency table Cij , such that the value at the intersection of a row i and a column j stores the amount of instances x X where hvx i and hux j. Table 1 shows an example of contingency table Cij for a kclass problem. The diagonal in matrix Cij contains the concomitant decisions of the pair, thus a naive approach to weight their similarity is to sum its values and divide it by the amount of instances n, as shown in Equation 1.
The Kappa Cohens statistic 2 statistic measures interrater agreement for categorical variables and it was first used in 6 as a pairwise diversity mea sure for ensemble learners, since it corrects 1 by considering the probability that two classifiers agree by chance according to the observed values in Cij, namely 2 see Equation 2. The kappa statistic is shown in Equation 3.
2 Ci,j Cj,i 2
Table 1: All possible outputs of a pair of classifiers hv and hu for a multiclass classification problem with k possible labels
hux0 hux1 … C00 C01 … C10 C11 …
… … … Ck10 Ck11 …
k 1 2 12
hvx 0 hvx 1
hv x k 1
C0k1 C1k1 …
The interpretation of is that if hu and hv agree on every instance then 1, and if their predictions coincide by chance, then 0. Negative values of occurs when agreement is weaker than expected by chance, but this rarely occurs in real problems.
Be cautious when calculating kc, , i.e. the Kappa statistic between the candidate c and each of the other learners during the window for which c was introduced. Specifically, take extra care when you REPLACE an existing e by c, remember to overwrite the respective counters.
Optional Implement the same method for measuring diversity in Lever aging Bagging 1 and Adaptive Random Forest 4 to estimate their diversity as well. This will allow for a more detailed analysis of the results in the Evalu ation part.
3 Evaluation and Analysis
The results should be presented using a table format, followed by a short analy sis about them guided by the questions associated with each experiment. More details are presented in the subsections for each experiment. At the end of this section you will find more details about how to prepare, run and report your experiments using either river or MOA.
Evaluation framework. For all experiments, use a testthentrain evalua tion approach and report the final result obtained, i.e. the result obtained after testing using all instances.
Metrics. For all experiments, report accuracy, time1.
Datasets. electricity, covertype, SEA abrupt drift at 50000, SEA gradual drift at 50000, width10000, and RTG 2abrupt drift at 30000 and 60000. Re fer to attached materials to obtain the .arff and .csv versions.
1In MOA, time can be estimated by the CPU Time
3.1 Experiment 1: TheEnsemble variations and sanity check
In the first part of the evaluation, you are asked to perform experiments us ing TheEnsemble and a single Hoeffding Tree. This experiment will produce 3 Tables of results algorithms columns vs datasets rows, where S K rep resents the number of learners, seed is the initializer to the random object2, and l is the length of the window. Important: you dont need to report the time taken for this experiment as it would require yet another run.
First table l 1000 for all TheEnsemble variations: HoeffdingTree, TheEnsembleS 5, TheEnsembleS 10, TheEnsembleS 20, and TheEnsembleS 30;
Second table S 20 and l 1000 for all: TheEnsembleseed 1, TheEnsembleseed 2, TheEnsembleseed 3, TheEnsembleseed 4, TheEnsembleseed 5
Third table S 20 for all: TheEnsemblel 500, TheEnsemblel 1000, TheEnsemblel 2000, TheEnsemblel 5000, TheEnsemblel 10000
Questions:
1. How does TheEnsemble compares against a single hoeffding tree? Also, does TheEnsemble improve results if more base models are available? Present a table with results varying from S 5, 10, 20, 30, where S is the total number of learners.
2. Groupsof2RequiredWhatistheimpactoftherandomizationstrategy on the diversity of the base learners? 1st repeat each experiment 5 times varying the random seed of the classifier and present a table with the average and standard deviation of each result. 2nd Present a plot, for each dataset, depicting the average Kappa statistic over time to support your claim.
3. Is the ensemble able to recover from concept drifts? Present a plot de picting the accuracy over time to support your claim. There should be a plot for each dataset comparing whichever version of TheEnsemble that produced the best results on the first table against the single hoeffding tree.
4. What is the impact of the l hyperparameter? Discuss the results varying the l hyperparameter.
5. Optional Which combination of l and S presents the best results for each dataset on average. This will require another separate table where l and S are combined together.
2The actual value of the random seed doesnt matter much, as long as it is not the same for every experiment.
Important: To plot results, use a windowed evaluation. In MOA, Evalu atePrequential with its default Evaluator should be sufficient. The code from the following repository can be used for the plots, but notice that any other tool or language can be used.
https:github.comhmgomesdatastreamvisualization
3.2 Experiment 2: TheEnsemble vs. others
Compare the best result of TheEnsemble, obtained in the previous experiment, against Adaptive Random Forest ARF 4, Leveraging Bagging LB 1 and the Dynamic Weighted Majority DWM 5 for each dataset. Use S 20 for all the ensembles and subspace m 60 for ARF. This will produce two tables with results:
First table: The accuracy for each ensemble and dataset.
Second table: The processing time for each ensemble and dataset.
Questions:
1. How does TheEnsemble performs in terms of predictive performance against the others?
2. In terms of time, is TheEnsemble more efficient than the other methods?
3. Optional Create another table varying the number of learners in the other datasets as well. Then discuss which ensemble method scales bet ter in terms of accuracy, i.e. produces better results as we increase the number of learners for the given datasets.
4. Optional What design choice made affect the computational resources the most for TheEnsemble? For example, one design choice is how the algorithm update learners, how diversity is induced, etc. Notice that the value of l or S is not a design choice, it is an hyperparameter configura tion.
References
1 A. Bifet, G. Holmes, and B. Pfahringer. Leveraging bagging for evolving data streams. In PKDD, pages 135150, 2010.
2 J. Cohen et al. A coefficient of agreement for nominal scales. Educational and psychological measurement, 201:3746, 1960.
3 H.M.Gomes,J.P.Barddal,F.Enembreck,andA.Bifet.Asurveyonensem ble learning for data stream classification. ACM Computing Surveys CSUR, 502:136, 2017.
4 H. M. Gomes, A. Bifet, J. Read, J. P. Barddal, F. Enembreck, B. Pfharinger, G. Holmes, and T. Abdessalem. Adaptive random forests for evolving data stream classification. Machine Learning, 106910:14691495, 2017.
5 J. Z. Kolter and M. A. Maloof. Dynamic weighted majority: An ensem ble method for drifting concepts. Journal of Machine Learning Research, 8Dec:27552790, 2007.
6 D. D. Margineantu and T. G. Dietterich. Pruning adaptive boosting. In ICML, volume 97, pages 211218. Citeseer, 1997.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com