程序代写代做代考 data mining Hidden Markov Mode Bayesian network Bayesian algorithm Jean Honorio

Jean Honorio
Purdue University

(originally prepared by Tommi Jaakkola, MIT CSAIL)

CS373 Data Mining and�
Machine Learning�

Lecture 1

Course topics
• Supervised learning

-  linear and non-linear classifiers, kernels
- rating, ranking, collaborative filtering
- model selection, complexity, generalization
- conditional Random fields, structured prediction

• Unsupervised learning, modeling
- mixture models, topic models
- Hidden Markov Models
- Bayesian networks
- Markov Random Fields, factor graphs

• Learning from examples

-1

training set

Machine learning

-1

training set

Machine learning
• Learning from examples

The training set of labeled examples
specifies the learning task only
implicitly

-1

training set

Machine learning
• Learning from examples

Our goal is to accurately label
new websites that were not
part of the training set

-1

training set

Machine learning
• Learning from examples

classifier = mapping
from websites to labels

predicted
label

new website

“Examples”
• We will have to first represent the examples

(websites) in a manner that can be easily mapped to
labels

White House
officials consulted
with the Justice
Department in
preparing a list of
U.S. attorneys who
would be removed.

(NYT 03/13/07)

news article

“Examples”
• We will have to first represent the examples

(websites) in a manner that can be easily mapped to
labels

White House
officials consulted
with the Justice
Department in
preparing a list of
U.S. attorneys who
would be removed.

(NYT 03/13/07)

White

House
officials

consulted

withthe

JusticeDepartment
in

preparing

list

U.S.

attorneys

who would
be

removed

bag of

words

news article

“Examples”
• We will have to first represent the examples

(websites) in a manner that can be easily mapped to
labels

White House
officials consulted
with the Justice
Department in
preparing a list of
U.S. attorneys who
would be removed.

(NYT 03/13/07)

White

House
officials

consulted

withthe

JusticeDepartment
in

preparing

list

U.S.

attorneys

who would
be

removed

bag of

words

news article

“Examples”
• We will have to first represent the examples

(websites) in a manner that can be easily mapped to
labels

White House
officials consulted
with the Justice
Department in
preparing a list of
U.S. attorneys who
would be removed.

(NYT 03/13/07)

White

House
officials

consulted

withthe

JusticeDepartment
in

preparing

list

U.S.

attorneys

who would
be

removed

bag of

words

news article

counts

a vector whose coordinates (features)
specify how many times (or whether)

particular words appeared in the article

The learning task
• The training set is now a set of labeled points

• Our goal is to find a “good” classifier
based on the training set
so that correctly labels any new websites

–

?!
new website

The learning task
• The training set is now a set of labeled points

• Our goal is to find a “good” classifier
based on the training set
so that correctly labels any new websites

–

?!
new website

Part 1: !
Model selection!

what type of classifiers
should we consider?

The learning task
• The training set is now a set of labeled points

• Our goal is to find a “good” classifier
based on the training set
so that correctly labels any new websites

–

?!
new website

Part 1: !
Model selection!

what type of classifiers
should we consider?

Part 2: !
Estimation!

how to select the best
classifier in the set?

Part 1: allowing all classifiers?

• We can easily construct a “silly classifier” that perfectly
classifiers any distinct set of training points

• But it doesn’t “generalize” (it doesn’t classify new points
very well)

–

?!
new website

Part 1: allowing few classifiers?

• We could instead consider very few alternatives such as

• But neither one classifies even training points very well

–

?!
new website

Part 1: linear classifiers
• A linear classifier (through origin) with parameters

divides the space into positive and negative halves

Part 1: linear classifiers
• A linear classifier (through origin) with parameters

divides the space into positive and negative halves

Part 1: linear classifiers
• A linear classifier (through origin) with parameters

divides the space into positive and negative halves

decision boundary

Recall:
cos(angle)=dot product
of two unit vectors
cos(90)=0

Part 1: linear classifiers
• A linear classifier (through origin) with parameters

divides the space into positive and negative halves

decision boundary

+
–

Part 2: estimation
• We can use the training error as a surrogate criterion

for finding the best linear classifier through origin

• Other choices are possible (and often preferable)

• The perceptron algorithm considers each training point
in turn, adjusting the parameters to correct any mistakes

• The algorithm will converge (no mistakes) if the training
points are linearly separable, otherwise it won’t converge

Perceptron algorithm