Data Mining and Machine Learning
Decision trees Peter Jančovič
Slide 1
Data Mining and Machine Learning
Slide 2
– Types of question
– Automatic construction of DTs from data – Example from Speech Recognition: Phone
Decision Trees
Data Mining and Machine Learning
Outline of lecture
Introduction to Decision Trees (DTs)
– A third approach to partitioning data – Basic principles
– Decision tree issues
A simple binary decision tree
5 23
y >-3? NY
1
2
y > x + 1.5
NY
45
4
x > 2.5? NY
N
x > -1.5? Y
1
3
Slide 3
Data Mining and Machine Learning
Binary decision trees
A binary decision tree consists of:
– A binary tree (non-terminal nodes have 2 ‘children’) – For each node s:
– A “yes/no” question qs which can be applied to any data sample
– A rule that determines which link is traversed if the answer to qs is “YES” (then the other link is traversed if the answer is “NO”)
Questions need not be numerical
Slide 4
Data Mining and Machine Learning
Binary decision trees
The decision tree on the previous slide is a simple ‘hand-crafted’ example, but what if the data was in a much higher dimensional vector space?
How do we decide the best question to associate with each node of the tree?
How do we know when the tree is complete?
What are the alternatives to numerical questions?
Slide 5
Data Mining and Machine Learning
Example from speech recognition
In speech recognition a word is modelled as a sequence of phones: e.g. “six” = /s/ /I/ /k/ /s/
Each phone is modelled as a statistical Hidden Markov Model (HMM). Each model has many parameters which must be estimated from data.
In English there are approximately 45 phones
Fundamental problem is “co-articulation” – the acoustic signal corresponding to a particular phone depends on preceding and following phones
Slide 6
Data Mining and Machine Learning
Example (continued)
Positions of articulators when a phone is produced depend on positions before and after it is produced
Context-sensitive phone modelling
Triphones – separate phone model for each preceding and following phone (so the /I/ in s I k s would be a model of /I/ preceded by /s/ and followed by /k/
Potentially huge model set (453 = 91,125 models)
Phone decision trees combine contexts that result in
similar coarticulation
Data Mining and Machine Learning
Slide 7
Phone decision trees
Collect very large set of acoustic patterns, each corresponding to a phone in context (triphones)
– “Acoustic pattern” means a sequence of feature vectors that capture the salient short-term properties of the speech signal
– Vectors are high-dimensional – typically ~40D
This is a “big data” problem. Identifying contexts that
induce similar coarticulation is a clustering problem. Data Mining and Machine Learning
Slide 8
Decision tree “questions”
Each node associated with a “question”
Knowledge from phonetics and linguistics defines a set of potential decision-tree node “questions”
For example:
– Q: “Is the left context one of {/p/, /b/, /t/, /d/}?” – Q: “Is the right-context {/m/}?”
Call these questions q1,…, qN
Slide 9
Data Mining and Machine Learning
Phone decision tree construction
Associate complete set of phones-in-context, S0, with root node
Each question qn partitions S0 into two subsets:
– The set S0nY of samples for which qn is true, and – The set S0nN of samples for which qn is false
Intuitively, if qn is a “good question” then:
– Samples in S0nY will be similar to each other, and – Samples in S0nN will be similar to each other
Let P0n = P0nY + P0nN where P0nA measures similarity between samples in S0nA (A = Y or N)
Slide 10
Data Mining and Machine Learning
Phone decision tree construction
P0nY could be a measure of how well S0nY is modelled by a single Gaussian PDF
Calculate P0n for every question qn
Let q0 be the question qn for which P0n is maximised
S0 q0
NY
S0,N S0,Y
Repeat whole process for S0,N and S0,Y, and so on
Slide 11
Data Mining and Machine Learning
Phone decision tree construction
S0 q0
NY
q1 S1 =S0,Y
S2 = S0,N
S3 = S1,N S4 = S1,Y
S5 = S3,N
NY NY
S6 = S3,Y
Stop if too few samples in Sn, or benefit of splitting Sn is too small
Slide 12
Data Mining and Machine Learning
Phone decision trees
Separate HMM built for each terminal (leaf) node
Given a new phone-in-context, negotiate the tree according to the answers to the questions until a terminal node is reached – the corresponding model is the one to use for this phone-in-context
Phone decision trees illustrate two important points:
– Example of DT with non-numerical questions
– Illustration of how DT can be constructed automatically from data
Slide 13
Data Mining and Machine Learning
Summary
Introduction to Decision Trees Decision Tree issues
Example: Phone decision Trees
Slide 14
Data Mining and Machine Learning