SGN-13000/SGN-13006 Introduction to Pattern Recognition and Machine Learning (5 cr) – Learning Sets of Rules
SGN-13000/SGN-13006 Introduction to
Pattern Recognition and Machine Learning
(5 cr)
Learning Sets of Rules
Joni-Kristian Kämäräinen
September 2016
Department of Signal Processing
Tampere University of Technology
1
Material
• Lecturer’s slides and blackboard notes
• T.M. Mitchell. Machine Learning. McGraw-Hill, 1997:
Chapter 10
• Computer examples
2
Contents
Introduction
Learning disjunctive sets of rules
Sequential covering algorithms
Learning First-Order Rules
Inverting resolution
FOIL
3
Introduction
Learning sets of rules
1. One of the most expressive and human readable
representations for learned hypothesis is a set of IF-THEN
rules.
2. IF-THEN rules are analog to propositional logic
3. Many applications in automatic assessing of chemical agents
and biocomputing → explaining phenomena!
4
Deductive decisions from a set of rules: PROLOG
Example (relations.pl)
% Literals (training set)
mother(lily,harry).
father(james,harry).
mother(mrsevans,lily).
father(mrevans,lily).
grandmother(mrsevans,harry).
grandfather(mrevans,harry).
father(septimus,arthur).
mother(molly,percy).
mother(molly,fred).
mother(molly,george).
mother(molly,ronald).
mother(molly,ginerva).
father(arthur,percy).
father(arthur,fred).
father(arthur,george).
father(arthur,ronald).
father(arthur,ginerva).
Query: mother(molly,X).
5
Learning disjunctive sets of rules
Learning disjunctive sets of rules
Sequential covering algorithms
Learning disjunctive sets of rules
Method 1: Learn decision tree, convert to rules
Method 2: Sequential covering algorithm:
1. Learn one rule with high accuracy, any coverage
2. Remove positive examples covered by this rule
3. Repeat
6
Learning disjunctive sets of rules
Method 1: Learn decision tree, convert to rules
Method 2: Sequential covering algorithm:
1. Learn one rule with high accuracy, any coverage
2. Remove positive examples covered by this rule
3. Repeat
6
Learning disjunctive sets of rules
Method 1: Learn decision tree, convert to rules
Method 2: Sequential covering algorithm:
1. Learn one rule with high accuracy, any coverage
2. Remove positive examples covered by this rule
3. Repeat
6
Sequential covering algorithm
Sequential-
covering(Target attribute,Attributes,Examples,Threshold)
1: Learned rules ← {}
2: Rule ← learn-one-rule(Target attribute,Attributes,Examples)
3: while performance(Rule,Examples) > Threshold do
4: Learned rules ← Learned rules + Rule
5: Examples ← Examples − {examples correctly classified by
Rule}
6: Rule ←
learn-one-rule(Target attribute,Attributes,Examples)
7: end while
8: Learned rules ← sort Learned rules accord to performance
over Examples
9: return Learned rules
7
…
…
IF Wind=weak
THEN PlayTennis=yes
IF Wind=strong
THEN PlayTennis=no
IF
THEN PlayTennis=yes
THEN
IF Humidity=normal
Wind=weak
PlayTennis=yes
IF Humidity=normal
THEN PlayTennis=yes
THEN
IF Humidity=normal
Outlook=sunny
PlayTennis=yes
THEN
IF Humidity=normal
Wind=strong
PlayTennis=yes THEN
IF Humidity=normal
Outlook=rain
PlayTennis=yes
IF Humidity=high
THEN PlayTennis=no
8
Learning First-Order Rules
Learning first order rules
• Can learn sets of (recursive) rules such as
Ancestor(x , y)← Parent(x , y)
Ancestor(x , y)← Parent(x , z) ∧ Ancestor(z , y)
• Learned rules are PROLOG programs
• Finding rules: Inductive Logic Programming (ILP)
9
Example: First-order rules in PROLOG
Example (relations.pl (cont.))
% Literals (training set)
mother(lily,harry).
father(james,harry).
mother(mrsevans,lily).
father(mrevans,lily).
grandmother(mrsevans,harry).
grandfather(mrevans,harry).
father(septimus,arthur).
mother(molly,percy).
mother(molly,fred).
mother(molly,george).
mother(molly,ronald).
mother(molly,ginerva).
father(arthur,percy).
father(arthur,fred).
father(arthur,george).
father(arthur,ronald).
father(arthur,ginerva).
% Clauses (induced rules)
grandfather(X,Y) :-
father(X,Z),
father(Z,Y).
grandfather(X,Y) :-
father(X,Z),
mother(Z,Y).
10
Example: Classifying Web pages
[Slattery, 1997]
course(A) ←
has-word(A, instructor),
Not has-word(A, good),
link-from(A, B),
has-word(B, assign),
Not link-from(B, C)
Train: 31/31, Test: 31/34
11
Learning First-Order Rules
Inverting resolution
Inverting Resolution
PassExam StudyC:
PassExam KnowMaterialC :
1
V
KnowMaterial StudyC :
2
V
PassExam KnowMaterialC :
1
V
KnowMaterial StudyC :
2
V
V
PassExam StudyC: V
12
Learning First-Order Rules
FOIL
FOIL(Target predicate,Predicates,Examples)
1: Pos ← positive Examples
2: Neg ← negative Examples
3: while Pos do {Learn a NewRule}
4: NewRule ← most general rule possible
5: NewRuleNeg ← Neg
6: while NewRuleNeg do {Add a new literal to specialize NewRule}
7: Candidate literals ← generate candidates
8: Best literal ← argmaxL∈Candidate literalsFoil Gain(L,NewRule)
9: add Best literal to NewRule preconditions
10: NewRuleNeg ← subset of NewRuleNeg that satisfies NewRule
preconditions
11: end while
12: Learned rules ← Learned rules + NewRule
13: Pos ← Pos − {members of Pos covered by NewRule}
14: end while
15: return Learned rules
13
Specializing Rules in FOIL
Learning rule: P(x1, x2, . . . , xk)← L1 . . . Ln
Candidate specializations add new literal of form:
• Q(v1, . . . , vr ), where at least one of the vi in the created
literal must already exist as a variable in the rule.
• Equal(xj , xk), where xj and xk are variables already present in
the rule
• The negation of either of the above forms of literals
14
Information Gain in FOIL
Foil Gain(L,R) ≡ t
(
log2
p1
p1 + n1
− log2
p0
p0 + n0
)
Where
• L is the candidate literal to add to rule R
• p0 = number of positive bindings of R
• n0 = number of negative bindings of R
• p1 = number of positive bindings of R + L
• n1 = number of negative bindings of R + L
• t is the number of positive bindings of R also covered by R +L
Note
• − log2
p0
p0+n0
is optimal number of bits to indicate the class of
a positive binding covered by R
15
Summary
Summary
1. Can be suitable for some novel application fields of machine
learning yet to be exploited: biocomputing, scientific expert
systems etc.
2. First-order Horn clauses (predicate logic) is a poweful
knowledge representation that can be used in logical decision
making with the PROLOG language and interpreter
3. The clauses can be automatically learnt from examples using
the FOIL algorithm
16
Introduction
Learning disjunctive sets of rules
Sequential covering algorithms
Learning First-Order Rules
Inverting resolution
FOIL
Summary