CS计算机代考程序代写 prolog decision tree algorithm interpreter SGN-13000/SGN-13006 Introduction to Pattern Recognition and Machine Learning (5 cr) – Learning Sets of Rules

SGN-13000/SGN-13006 Introduction to Pattern Recognition and Machine Learning (5 cr) – Learning Sets of Rules

SGN-13000/SGN-13006 Introduction to

Pattern Recognition and Machine Learning

(5 cr)

Learning Sets of Rules

Joni-Kristian Kämäräinen

September 2016

Department of Signal Processing

Tampere University of Technology

1

Material

• Lecturer’s slides and blackboard notes

• T.M. Mitchell. Machine Learning. McGraw-Hill, 1997:
Chapter 10

• Computer examples

2

Contents

Introduction

Learning disjunctive sets of rules

Sequential covering algorithms

Learning First-Order Rules

Inverting resolution

FOIL

3

Introduction

Learning sets of rules

1. One of the most expressive and human readable

representations for learned hypothesis is a set of IF-THEN

rules.

2. IF-THEN rules are analog to propositional logic

3. Many applications in automatic assessing of chemical agents

and biocomputing → explaining phenomena!

4

Deductive decisions from a set of rules: PROLOG

Example (relations.pl)

% Literals (training set)

mother(lily,harry).

father(james,harry).

mother(mrsevans,lily).

father(mrevans,lily).

grandmother(mrsevans,harry).

grandfather(mrevans,harry).

father(septimus,arthur).

mother(molly,percy).

mother(molly,fred).

mother(molly,george).

mother(molly,ronald).

mother(molly,ginerva).

father(arthur,percy).

father(arthur,fred).

father(arthur,george).

father(arthur,ronald).

father(arthur,ginerva).

Query: mother(molly,X).

5

Learning disjunctive sets of rules

Learning disjunctive sets of rules

Sequential covering algorithms

Learning disjunctive sets of rules

Method 1: Learn decision tree, convert to rules

Method 2: Sequential covering algorithm:

1. Learn one rule with high accuracy, any coverage

2. Remove positive examples covered by this rule

3. Repeat

6

Learning disjunctive sets of rules

Method 1: Learn decision tree, convert to rules

Method 2: Sequential covering algorithm:

1. Learn one rule with high accuracy, any coverage

2. Remove positive examples covered by this rule

3. Repeat

6

Learning disjunctive sets of rules

Method 1: Learn decision tree, convert to rules

Method 2: Sequential covering algorithm:

1. Learn one rule with high accuracy, any coverage

2. Remove positive examples covered by this rule

3. Repeat

6

Sequential covering algorithm

Sequential-

covering(Target attribute,Attributes,Examples,Threshold)

1: Learned rules ← {}
2: Rule ← learn-one-rule(Target attribute,Attributes,Examples)
3: while performance(Rule,Examples) > Threshold do

4: Learned rules ← Learned rules + Rule
5: Examples ← Examples − {examples correctly classified by

Rule}
6: Rule ←

learn-one-rule(Target attribute,Attributes,Examples)

7: end while

8: Learned rules ← sort Learned rules accord to performance
over Examples

9: return Learned rules
7

IF Wind=weak
THEN PlayTennis=yes

IF Wind=strong
THEN PlayTennis=no

IF

THEN PlayTennis=yes

THEN

IF Humidity=normal
Wind=weak

PlayTennis=yes

IF Humidity=normal
THEN PlayTennis=yes

THEN

IF Humidity=normal
Outlook=sunny

PlayTennis=yes

THEN

IF Humidity=normal
Wind=strong

PlayTennis=yes THEN

IF Humidity=normal
Outlook=rain

PlayTennis=yes

IF Humidity=high
THEN PlayTennis=no

8

Learning First-Order Rules

Learning first order rules

• Can learn sets of (recursive) rules such as

Ancestor(x , y)← Parent(x , y)
Ancestor(x , y)← Parent(x , z) ∧ Ancestor(z , y)

• Learned rules are PROLOG programs

• Finding rules: Inductive Logic Programming (ILP)

9

Example: First-order rules in PROLOG

Example (relations.pl (cont.))

% Literals (training set)

mother(lily,harry).

father(james,harry).

mother(mrsevans,lily).

father(mrevans,lily).

grandmother(mrsevans,harry).

grandfather(mrevans,harry).

father(septimus,arthur).

mother(molly,percy).

mother(molly,fred).

mother(molly,george).

mother(molly,ronald).

mother(molly,ginerva).

father(arthur,percy).

father(arthur,fred).

father(arthur,george).

father(arthur,ronald).

father(arthur,ginerva).

% Clauses (induced rules)

grandfather(X,Y) :-

father(X,Z),

father(Z,Y).

grandfather(X,Y) :-

father(X,Z),

mother(Z,Y).

10

Example: Classifying Web pages

[Slattery, 1997]

course(A) ←
has-word(A, instructor),

Not has-word(A, good),

link-from(A, B),

has-word(B, assign),

Not link-from(B, C)

Train: 31/31, Test: 31/34

11

Learning First-Order Rules

Inverting resolution

Inverting Resolution

PassExam StudyC:

PassExam KnowMaterialC :
1

V

KnowMaterial StudyC :
2

V

PassExam KnowMaterialC :
1

V

KnowMaterial StudyC :
2

V

V
PassExam StudyC: V

12

Learning First-Order Rules

FOIL

FOIL(Target predicate,Predicates,Examples)

1: Pos ← positive Examples
2: Neg ← negative Examples
3: while Pos do {Learn a NewRule}
4: NewRule ← most general rule possible
5: NewRuleNeg ← Neg
6: while NewRuleNeg do {Add a new literal to specialize NewRule}
7: Candidate literals ← generate candidates
8: Best literal ← argmaxL∈Candidate literalsFoil Gain(L,NewRule)
9: add Best literal to NewRule preconditions

10: NewRuleNeg ← subset of NewRuleNeg that satisfies NewRule
preconditions

11: end while

12: Learned rules ← Learned rules + NewRule
13: Pos ← Pos − {members of Pos covered by NewRule}
14: end while

15: return Learned rules
13

Specializing Rules in FOIL

Learning rule: P(x1, x2, . . . , xk)← L1 . . . Ln

Candidate specializations add new literal of form:

• Q(v1, . . . , vr ), where at least one of the vi in the created
literal must already exist as a variable in the rule.

• Equal(xj , xk), where xj and xk are variables already present in
the rule

• The negation of either of the above forms of literals

14

Information Gain in FOIL

Foil Gain(L,R) ≡ t
(

log2
p1

p1 + n1
− log2

p0
p0 + n0

)
Where

• L is the candidate literal to add to rule R
• p0 = number of positive bindings of R
• n0 = number of negative bindings of R
• p1 = number of positive bindings of R + L
• n1 = number of negative bindings of R + L
• t is the number of positive bindings of R also covered by R +L

Note

• − log2
p0

p0+n0
is optimal number of bits to indicate the class of

a positive binding covered by R
15

Summary

Summary

1. Can be suitable for some novel application fields of machine

learning yet to be exploited: biocomputing, scientific expert

systems etc.

2. First-order Horn clauses (predicate logic) is a poweful

knowledge representation that can be used in logical decision

making with the PROLOG language and interpreter

3. The clauses can be automatically learnt from examples using

the FOIL algorithm

16

Introduction
Learning disjunctive sets of rules
Sequential covering algorithms

Learning First-Order Rules
Inverting resolution
FOIL

Summary