代写代考 CS 189 (CDSS offering)

Lecture 22: Decision trees CS 189 (CDSS offering)
2022/03/16

Today’s lecture

Copyright By PowCoder代写 加微信 powcoder

• Today, we will lay the foundations behind probably the second most intuitive model after nearest neighbor: decision trees
• A decision tree describes how to make a decision/prediction based on the input
• Decision tree learning concerns how to build decision trees from data
• Decision trees are widely used in real world applications, and we’ll discuss why
• One reason is that they can be ensembled to give very good performance — this is the topic of Friday’s lecture

What’s a decision tree?

Decision trees
• We are given a dataset (x1, y1), …, (xN, yN)
• A decision tree for this problem will repeatedly split on select features of x (at
each internal node of the tree) in order to predict y (at the leaf nodes of the tree)
• Each node contains the subset of the data that matches the splits made so far
• Therefore, ideally, the leaf nodes contain very little (or no) variability in y
• The prediction for a new point x! proceeds down the tree to a leaf node, where
the prediction is a majority vote (for classification) or average (for regression)

Learning decision trees
• Learning optimal decision trees, for reasonable definitions of optimality, is known to be NP-complete (very, very hard)
• Algorithms for learning trees therefore only try to find “good” trees
• By far, the most common approaches for learning decision trees are greedy
algorithms that operate in a top down manner
• Top down means we start with a tree that is a single node with all the data, and we
continually “build out” the tree to get better and better leaf nodes
• Greedy means that, every time split a node, we only consider how good the immediate split will be, rather than looking further ahead at potential future splits

Splitting metrics
• When we are learning the tree, at every node, we consider all features and all possible splitting values, and we try to figure out which will make the best split
• If this sounds expensive, that’s because it certainly can be — we’ll see next time that considering all possible splits for all features may not be necessary
• How do we define whether one split is better than another?
• Various metrics have been proposed to try and characterize the “goodness” of a
split, and they are pretty much all based on concepts from information theory 6

One metric: information gain
recall entropy onion
EIIIEEEFI.IE
IT O 811311

Another metric: Gini impurity
f randomly sample two points g how often dothey disagree
p t g pl g
in general for K classes
G EE Pull pa ÉPÉE
can use G in place of H on previous slide 8

Metrics for continuous targets
what ifeachyoEIR
node with L points is given by Y I Ect
e.g EE Yi y 51 andmaybeset
prediction
to median again used in the same way as H and G

Advantages of decision trees
• Decision trees are often said to be simple to understand and interpret
• They’re probably at least more likely to be understandable and interpretable, compared to other models we have studied and will study
• Decision trees are a natural fit for categorical data and often require little to no data preprocessing
• Decision trees can give the practitioner an intuition for which features are the most important (split first), which features are irrelevant, etc.

Disadvantages of decision trees
• Decision trees can overfit, e.g., imagine a tree where each leaf node contains only one data point
• To tackle this, we need regularization! E.g.: restrict the maximum depth and/or minimum number of points in any leaf; stop splitting when information gain is not above some minimum threshold value; pruning (more on this next time)
• Decision trees by themselves don’t actually work very well…
• But they can work very well when ensembled!
• However, this takes away some of the advantages such as being simple to understand and interpret, so it’s all a trade off

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com