程序代写代做代考 finance graph C Video 1: Introduction

Video 1: Introduction
ANLP Week 3/Unit 1
Text Categorization with Naive Bayes
Sharon Goldwater
Sharon Goldwater ANLP Week 3/Unit 1
Text categorization: example
Dear Prof.Sharon Goldwater:
My name is [XX]. I am an ambitious applicant for the Ph.D program of Electrical Engineering and Computer Science at your university. Especially being greatly attracted by your research projects and admiring for your achievements via the school website, I cannot wait to write a letter to express my aspiration to undertake the Ph.D program under your supervision.
I have completed the M.S. program in Information and Communication Engineering with a high GPA of 3.95/4.0 at [YY] University. In addition to throwing myself into the specialized courses in […] I took part in the research projects, such as […]. I really enjoyed taking the challenges in the process of the researches and tests, and I spent two years on the research project […]. We proved the effectiveness of the new method for […] and published the result in […].
Having read your biography, I found my academic background and research experiences indicated some possibility of my qualification to join your team. It is my conviction that the enlightening instruction, cutting-edge research projects and state of-the-art facilities offered by your team will direct me to make breakthroughs in my career development in the arena of electrical engineering and computer science. Thus, I shall be deeply grateful if you could give me the opportunity to become your student. Please do not hesitate to contact me, should you need any further information about my scholastic and research experiences.
Yours sincerely, [XX].
Sharon Goldwater ANLP Week 3/Unit 1 2
Sharon Goldwater
ANLP Week 3/Unit 1 1
Today’s lecture
• What are some examples of text categorization tasks?
• What is a Naive Bayes classifier and how do we apply it to text
categorization (in general, or for specific tasks)? • What are some pros and cons of Naive Bayes? • How do we evaluate categorization accuracy?
Sharon Goldwater
ANLP Week 3/Unit 1 3

Text categorization (classification)
We might want to categorize the content of the text:
• Spam detection (binary classification: spam/not spam)
• Sentiment analysis (binary or multiway)
– movie, restaurant, product reviews (pos/neg, or 1-5 stars)
– political argument (pro/con, or pro/con/neutral)
• Topic classification (multiway: sport/finance/travel/etc)
Sharon Goldwater ANLP Week 3/Unit 1 4
Video 2: Naive Bayes classifier
Text categorization (classification)
Or we might want to categorize the author of the text (authorship attribution):
• Native language identification (e.g., to tailor language tutoring)
• Diagnosis of disease (psychiatric or cognitive impairments)
• Identification of gender, dialect, educational background (e.g., in forensics [legal matters], advertising/marketing).
Sharon Goldwater ANLP Week 3/Unit 1 5
Formalizing the task
• Given document d and set of categories C, we want to assign d to the most probable category cˆ:
cˆ = argmaxP(c|d) c∈C
= argmaxP(d|c)P(c) c∈C P (d)
= argmaxP(d|c)P(c) c∈C
Sharon Goldwater
ANLP Week 3/Unit 1 6
Sharon Goldwater
ANLP Week 3/Unit 1 7

Document model
Each document d is represented by features f1, f2, . . . fn, e.g.:
• For topic classification: 2000 most frequent words, excluding
stopwords like the, a, do, in.
• For sentiment classification: words from a sentiment lexicon
In fact, we only care about the feature counts, so this is a bag-of- words (unigram) model.
Task-specific features
Sharon Goldwater
ANLP Week 3/Unit 1 8
Example documents
… … …
From http://www.enchantedlearning.com/wordlist/
Sharon Goldwater ANLP Week 3/Unit 1 9
Document model, cont.
• Representing d using its features gives us: P(d|c) = P(f1,f2,…fn|c)
• But we can’t estimate this joint probability well (too sparse).
• So, make a Naive Bayes assumption: features are conditionally
independent given class.
P (d|c) ≈ P (f1|c)P (f2|c) . . . P (fn|c)
Sharon Goldwater ANLP Week 3/Unit 1 11
• Possible feature counts from training documents in a spam- detection task (where we did not exclude stopwords):
the your model cash Viagra class account orderz doc1123 1 0 0 2 0 0 doc2104 0 4 0 0 2 0 doc3254 0 0 0 1 1 0 doc4142 0 1 3 0 1 1 doc5175 0 2 0 0 1 1
Sharon Goldwater ANLP Week 3/Unit 1 10
Example words from a sentiment lexicon:
Positive:
absolutely adorable accepted acclaimed accomplish achieve action active admire adventure affirm
beaming calm beautiful celebrated believe certain beneficial champ bliss champion bountiful charming bounty cheery brave choice bravo classic brilliant classical bubbly clean
Negative:
abysmal adverse alarming angry annoy anxious apathy appalling atrocious awful
bad
banal barbed belligerent bemoan beneath boring broken
callous
can’t
clumsy coarse
cold
collapse confused contradictory contrary corrosive corrupt

Full model
Generative process
• Given document with features f1,f2,…fn and set of categories C, choose
• Naive Bayes classifier is another generative model.
• Assumes the data (features in each doc) were generated as
– For each document, choose its class c with prob P(c).
– For each feature in each doc, choose the value of that feature
􏰅n c∈C i=1
P(fi|c)
– see Basic Prob Theory reading Ex 5.5.3 for a non-text example
cˆ = argmaxP(c)
• This is called a Naive Bayes classifier
Sharon Goldwater ANLP Week 3/Unit 1 12
Sharon Goldwater
ANLP Week 3/Unit 1
13
with prob P(f|c)
Learning the class priors
Learning the class priors: example
• Given training documents with correct labels:
• P(c) normally estimated with MLE: Pˆ(c) = Nc
the your doc1123 1 doc2104 0 doc3254 0 doc4142 0 doc5175 0
model cash
0 0 2 0 0 – 4 0 0 2 0 + 0 0 1 1 0 –
Viagra class account orderz spam?
N
– N = the number of training documents in class c
1 3 0 1 1
c
– N = the total number of training documents
Sharon Goldwater ANLP Week 3/Unit 1 14
2 0 0 1
+ 1 +
• Pˆ(spam) = 3/5 Sharon Goldwater
ANLP Week 3/Unit 1
15

Learning the feature probabilities
Learning the feature probabilities: example
• P(fi|c) normally estimated with simple smoothing:
count(fi, c) + α f∈F(count(f,c)+α)
– count(fi, c) = the number of times fi occurs in class c
– F = the set of possible features
– α: the smoothing parameter, optimized on held-out data
Sharon Goldwater ANLP Week 3/Unit 1
the your doc1123 doc2104 doc3254 doc4142 doc5175
model cash Viagra 1 0 0
0 4 0
0 0 0
0 1 3 0 2 0
class account orderz spam?
Pˆ(fi|c) = 􏰃
– +
2 0 0
0 2 0
1 1 0 – 0 1 1 +
Learning the feature probabilities: example
Learning the feature probabilities: example
0 1
1 +
16
Sharon Goldwater
ANLP Week 3/Unit 1
17
the your model doc11231
doc21040 doc32540 doc41420 doc51750
cash Viagra 00 40 00 13 20
class account 20 02 11 01 01
orderz spam? 0 –
0 +
0 –
1 + 1 +
the your model doc11231
doc21040 doc32540 doc41420 doc51750
cash Viagra 00 40 00 13 20
class account 20 02 11 01 01
orderz spam? 0 –
0 +
0 –
1 + 1 +
Pˆ(your|+) = Sharon Goldwater
(4+2+5+α) (tokens in + class)+α|F|
= (11 + α)/(68 + α|F |)
Pˆ(your|+) =
Pˆ(your|−) =
(4+2+5+α) (tokens in + class)+α|F|
(3+4+α)
(tokens in − class)+α|F|
= (11 + α)/(68 + α|F |) = (7 + α)/(49 + α|F |)
= (2+α)/(68+α|F|) Sharon Goldwater ANLP Week 3/Unit 1 19
Pˆ(orderz|+) = (2+α)
(tokens in + class)+α|F|
ANLP Week 3/Unit 1
18

Classifying a test document: example
Classifying a test document: example
• Test document d:
get your cash and your orderz
• Suppose there are no other features besides those in previous table (so get and and are not counted). Then
• Test document d:
get your cash and your orderz
• Do the same for P(−|d)
• Choose the one with the larger value
P (+|d) ∝ P (+)
P (fi|+)
􏰅n i=1
= 3· 11+α · 7+α
5 (68+αF) (68+αF) · 11+α · 2+α
(68 + αF) (68 + αF)
ANLP Week 3/Unit 1 20
Sharon Goldwater
Sharon Goldwater
ANLP Week 3/Unit 1 21
Video 3: Naive Bayes: features, variations, pros and cons
Alternative feature values and feature sets
• Use only binary values for fi: did this word occur in d or not? • Use only a subset of the vocabulary for F
– Ignore stopwords (function words and others with little content) – Choose a small task-relevant set (e.g., using a sentiment
lexicon)
• Use more complex features (bigrams, syntactic features, morphological features, …)
Sharon Goldwater
ANLP Week 3/Unit 1 22
Sharon Goldwater ANLP Week 3/Unit 1 23

Positive:
absolutely adorable accepted acclaimed accomplish achieve action active admire adventure affirm
beaming calm beautiful celebrated believe certain beneficial champ bliss champion bountiful charming bounty cheery brave choice bravo classic brilliant classical bubbly clean
Negative:
abysmal adverse alarming angry annoy anxious apathy appalling atrocious awful
bad
banal barbed belligerent bemoan beneath boring broken
callous
can’t
clumsy coarse
cold
collapse confused contradictory contrary corrosive corrupt
Task-specific features
Task-specific features
Example words from a sentiment lexicon:
• But: other words might be relevant for specific sentiment analysis tasks.
– E.g., quiet, memory for product reviews.
• And for other tasks, stopwords might be very useful features
– E.g., People with schizophrenia use more 2nd-person pronouns (Watson et al., 2012), those with depression use more 1st- person (Rude et al., 2004).
• Probably better to use too many irrelevant features than not enough relevant ones.
Sharon Goldwater ANLP Week 3/Unit 1 25
Problems with Naive Bayes
• Naive Bayes assumption is naive!
• Consider categories Travel, Finance, Sport.
• Are the following features independent given the category?
beach, sun, ski, snow, pitch, palm, football, relax, ocean
… … …
From http://www.enchantedlearning.com/wordlist/
Sharon Goldwater ANLP Week 3/Unit 1 24
Advantages of Naive Bayes
• Very easy to implement
• Very fast to train and test
• Doesn’t require as much training data as some other methods • Usually works reasonably well
Use as a simple baseline for any classification task.
Sharon Goldwater ANLP Week 3/Unit 1 26
Sharon Goldwater ANLP Week 3/Unit 1 27

Problems with Naive Bayes
Non-independent features
• Naive Bayes assumption is naive!
• Consider categories Travel, Finance, Sport.
• Are the following features independent given the category?
beach, sun, ski, snow, pitch, palm, football, relax, ocean
• No! They might be closer if we defined finer-grained categories (beach vacations vs. ski vacations), but we don’t usually want to.
Sharon Goldwater ANLP Week 3/Unit 1 28
How to evaluate performance?
• Important question for any NLP task
• Intrinsic evaluation: design a measure inherent to the task
– Language modeling: perplexity
– POS tagging: accuracy (% of tags correct) – Categorization: F-score (coming up next)
• Features are not usually independent given the class
• Adding multiple feature types (e.g., words and morphemes) often
leads to even stronger correlations between features
• Accuracy of classifier can sometimes still be ok, but it will be
highly overconfident in its decisions.
– Ex: NB sees 5 features that all point to class 1, treats them as
five independent sources of evidence.
– Like asking 5 friends for an opinion when some got theirs from each other.
Sharon Goldwater ANLP Week 3/Unit 1 29
Video 4: Evaluating classifiers
Sharon Goldwater ANLP Week 3/Unit 1 30
Sharon Goldwater
ANLP Week 3/Unit 1 31

How to evaluate performance?
Intrinsic evaluation for categorization
• Important question for any NLP task
• Intrinsic evaluation: design a measure inherent to the task
– Language modeling: perplexity
– POS tagging: accuracy (% of tags correct) – Categorization: F-score (coming up next)
• Extrinsic evaluation: measure effects on a downstream task
– Language modeling: does it improve my ASR/MT system?
– POS tagging: does improve my parser/IR system?
– Categorization: does it reduce user search time in an IR setting?
Sharon Goldwater ANLP Week 3/Unit 1 32
Two measures
• Assume we have a gold standard: correct labels for test set
• We also have a system for detecting the items of interest (docs
• Categorization as detection: document about sport or not?
• Classes may be very unbalanced.
• Can get 95% accuracy by always choosing “not”; but this isn’t useful.
• Need a better measure.
Sharon Goldwater
ANLP Week 3/Unit 1 33
about sport)
Precision = Recall =
Example of precision and recall
Doc about sports?
Goldstandard YYNNYNNNYNN
Systemoutput N Y N Y N N N N Y N N
# items detected and was right # items system detected
# items detected and was right
# items system should have detected
• #‘Y’wegotright=2 • #‘Y’weguessed=3 • #‘Y’inGS=4
• Precision=2/3 • Recall=2/4
Sharon Goldwater
ANLP Week 3/Unit 1 34
Sharon Goldwater
ANLP Week 3/Unit 1 35

Precision-Recall curves
• If system has a tunable parameter to vary the precision/recall:
Figure from: http://ivrgwww.epfl.ch/supplementary_material/RK_CVPR09/
Why use both measures?
• Normallywejustsetβ=1togetF1: F1 = 2PR
P+R
β
Doc Sys prob Gold 23 0.99 Y 12 0.98 Y 45 0.93 Y 01 0.93 Y 37 0.89 N 24 0.84 Y 16 0.78 Y 18 0.75 N 20 0.72 Y … … … 38 0.03 N 19 0.03 N
Systems often have (implicit or explicit) tuning thresholds on how many answers to return.
• e.g., Return as Y all docs where system thinks P(C=sport) is greater than t.
• Raise t: higher precision, lower recall.
• Lower t: lower precision, higher recall.
Sharon Goldwater ANLP Week 3/Unit 1
F-measure
• Can also combine precision and recall into single F-measure:
36
F
= (β2 + 1)P R β2P + R
References
Rude, S., Gortner, E.-M., and Pennebaker, J. (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion, 18(8):1121– 1133.
Watson, A. R., Defterali, C ̧., Bak, T. H., Sorace, A., McIntosh, A. M., Owens, D. G., Johnstone, E. C., and Lawrie, S. M. (2012). Use of second-person pronouns and schizophrenia. The British Journal of Psychiatry, 200(4):342– 343.
• F1 is the harmonic mean of P and R: similar to arithmetic mean when P and R are close, but penalizes large differences between P and R.
Sharon Goldwater ANLP Week 3/Unit 1 38
Sharon Goldwater
ANLP Week 3/Unit 1 39