程序代写代做代考 chain decision tree COMP90042 - Natural Language Processing Workshop Week 3

COMP90042 – Natural Language Processing Workshop Week 3
Biaoyan Fang 16 March 2020

Recap Pre-processing
Pipeline
• Formatting
• Sentence Segmentation
• Tokenisation
• Normalisation
• Lemmatisation • Stemming
• Remove Stopwords
1/12

Outline
1. Text-classification 2. Language Model
2/12

Bag of words
Abbildung 1: Bag of words
3/12

K-Nearest Neighbour
Euclidean distance: Usually length is not a distinguishing character
Cosine similarity:
Better;
Suffer from high-dimensionality problem
Abbildung 2: k-nearest neighbour
4/12

Decision Tree
Can be useful in finding meaningful features
Spurious correlation; Tend to rare features
Abbildung 3: Decision Tree
5/12

Naive Bayes
Conditional independence of features
Bayes law
P(A|B) = P(B|A)P(A) P(B)
Surprisingly useful
Then
p(cn|f1, . . . , fm) =
∏m i=1
p(fi|cn)p(cn)
6/12

Logistic Regression
Useful. Relax the conditional independence requirement of Naive Bayes
Handle large numbers of features
Abbildung 4: Logistic Regression
7/12

SVM
More powerful
Suffer form multiple classes problem
Abbildung 5: SVM
8/12

Probabilistic Language Model
A probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability P(w1 , . . . , wm ) to the whole sequence.
Goal: compute the probability of a sentence or sequence of words:
P(W) = P(w1,w2,…,wm) P(w1,w2,…,wm) = P(w1)P(w2|w1)…P(wm|w1,…wm−1)
Chain rule:
9/12

Probabilistic Language Model
Markvo assumption:
P(wi|w1, . . . , wi−1) ≈ P(wi|wi−n+1, . . . , wi−1)
While,
n = 1, unigram language model:
P(w1,w2,…,wm) = n = 2, bigram language model:
∏m i=1
∏m i=1
P(w1, w2, . . . , wm) = n = 3, trigram language model:
P(wi|wi−1)
P(w1, w2, . . . , wm) =
P(wi|wi−2wi−1)
∏m i=1
P(wi)
10/12

Probabilistic Language Model
Continuation counts:
|{wi−1 : C(wi−1, wi) > 0}|
Pcont(wi)= ∑ |{w :C(w ,w)>0}| wi i−1 i−1i
11/12

Discussion
Questions
12/12

Related Posts