程序代写代做代考 case study android algorithm lecture08.pptx

lecture08.pptx

LECTURE 8

Sentiment Annlys嘣i嘣

nrklitz AZubilgl, A31
嘣t
AJlnulrs, A2018

2

 Sentiment Alnlys嘣i嘣 A녰efnition Aln녰 Alppyicltion嘣.

 Buiy녰ing Al A嘣entiment Acyl嘣嘣ifer

 Un嘣upervi嘣e녰 A嘣entiment Acyl嘣嘣ifcltion.

 Supervi嘣e녰 A嘣entiment Acyl嘣嘣ifcltion.

 Other Achlyyenge嘣 Ain A嘣entiment Acyl嘣嘣ifcltion.

LECTURE A8 ACONTENTS

3

SENTIMENT AnNnLYSIS

 Many names to refer to sentiment analysis:
 Sentiment classification.
 Opinion mining.
 Opinion extraction.
 Sentiment mining.
 Subjectivity analysis.

4

SENTIMENT AnNnLYSIS

 Given a text (e.g. a review) as input, is the text:
 Positive?

Very comfortable hotel, reasonably priced, I’ll definitely be back.

 Neutral?
We stayed in the hotel for one night.

 Negative?
Terrible service, it was very noisy, we didn’t sleep. Never again.

5

SENTIMENT AnNnLYSIS

 Depending on our needs, we can use finer granularity, e.g. 1-5 stars.
 Terrible, don’t go there!
 Quite bad, but at least the food was good.
 Average hotel, could improve.
 I did like it, just a bit pricey!
 It was great, we’ll be back!

6

SENTIMENT AnNnLYSIS

 Classify sentiment for different parts/sentences of a text.

7

nSPECT-BnSED ASENTIMENT AnNnLYSIS

 Link those sentiments to different aspects.
 Sentiment classification + aspect classification.

8

TnRGET-SPECIFIC ASENTIMENT AnNnLYSIS

 What do people say about a particular product, brand, person?
 Sentiment classification + target detection.

I’ve used Android and iPhone phones, and I hate the latter!

Negative, but only towards iPhone

9

SCHERER ATYPOLOGY AOF AnFFECTIVE ASTnTES

 Emotion: angry, sad, joyful, fearful, ashamed, proud, elated
 Mood: cheerful, gloomy, irritable, listless, depressed, buoyant
 Interpersonal stances: friendly, flirtatious, distant, cold, warm,

supportive, contemptuous
 Attitudes: liking, loving, hating, valuing, desiring
 Personality traits: nervous, anxious, reckless, morose, hostile,

jealous

10

SENTIMENT AnNnLYSIS

 Simplest task:
 Is the attitude of text positive, neutral or negative?

 More complex:
 Rank attitude of text from 1 to 5 (diff. levels of intensity)

 Advanced:
 Detect the aspect, target, source, or complex attitude types

11

SENTIMENT AnNnLYSIS

 Simplest task:
 Is the attitude of text positive, neutral or negative?

 More complex:
 Rank attitude of text from 1 to 5 (diff. levels of intensity)

 Advanced:
 Detect the aspect, target, source, or complex attitude types

12

nGGREGnTING ASENTIMENT AnNnLYSIS

 It’s particularly interesting when we aggregate sentiment from
different users:

 Do people like this hotel?
 How many people say it’s bad?
 Which hotel is best liked?
 Which hotel has the comfiest bed?

13

nPPLICnTIONS AOF ASENTIMENT AnNnLYSIS

 http://www.sentiment140.com:
Look up sentiment of tweets towards products, brands,…

14

nPPLICnTIONS AOF ASENTIMENT AnNnLYSIS

 Google Products (discontinued)

15

nPPLICnTIONS AOF ASENTIMENT AnNnLYSIS

 Booking.com: aspect + positivity classification.

16

nPPLICnTIONS AOF ASENTIMENT AnNnLYSIS

 TOTEMSS: http://elections.iti.gr/uk2017/?normalised=true

17

nPPLICnTIONS AOF ASENTIMENT AnNnLYSIS

 Using sentiment of tweets for stock market prediction.
Ranco G, Aleksovski D, Caldarelli G, Grčar M, Mozetič I (2015) The Effects of Twitter Sentiment on Stock Price Returns. PLoS ONE 10(9): e0138441.

18

HOW ATO ABUILD An ASENTIMENT ACLnSSIFIER

 Two main approaches:
 Unsupervised classifier using lexicons.
 Supervised classifier leveraging training data.

UNSUPERVISED CLASSIFIER
USING LEXICONS

20

CLnSSIFIER AUSING ALEXICONS

 Using lexicons of positive and negative words, e.g.:

 Count number of +/- words in a review:

 Majority wins (are there more + or – words?).
 Produce percentages (x% positive, y% negative).

21

CLnSSIFIER AUSING ALEXICONS

 + No need for training data (reviews annotated as +/-), just lexicons.
 + Quick and easy to implement.
 – Can’t handle negations and other expressions, e.g.:

The restaurant is not bad at all, so we will be back!

I had heard that the service was slow, the food was terrible, and that it was
pricey… but hey, it’s actually awesome!

22

FREQUENCY AOF AWORDS AIN AREVIEWS

 1-10 rating system.

23

NEGnTIONS

 How frequent are negations for +/- reviews?

 More frequent in negative reviews, but common in positive reviews.

24

WORKnROUND AFOR ANEGnTIONS

 The restaurant is not bad at all, so we will be back!

 Workaround: add “not_” to words after negation, up to next punctuation.

 The restaurant is not not_bad not_at not_all, so we will be back!

 Expand lexicons to incorporate “not_*” in opposite list, e.g.:
not_happy, not_good as negative
not_bad, not_sad as positive

25

WORKnROUND AFOR ANEGnTIONS

 The restaurant is not not_bad not_at not_all, so we will be back!

 It has shown to work well, but it has some problems too, e.g.:

I like the restaurant, not only the fantastic atmosphere but also the great service.

I like the restaurant, not not_only not_the not_fantastic not_atmosphere not_but not_also not_the not_great not_service.

 Negating words don’t always imply negation.

? ?

26

LEnRNING ASENTIMENT ALEXICONS

 Hatzivassiloglou and McKeown, 1997:
 Build seed list manually (+: good, happy, -: bad, sad)
 Use corpora to look for “and” and “but” between adjectives:

 good and fun → good is positive, fun must be also positive.
 happy but worried → happy is positive, worried must be

negative.
 Turney, 2002:

 Look for phrases co-occurring with seed list words: high
overlap with reviews containing “good” must be positive phrase

SUPERVISED CLASSIFIER
LEVERAGING TRAINING DATA

28

COLLECTION AOF ADnTnSETS

 Ways of obtaining sentiment annotated datasets:

1) Look for existing datasets.

2) Scrape web datasets.

3) Crowdsource dataset annotation.

4) Distant supervision.

29

1) ALOOK AFOR AEXISTING ADnTnSETS

 From researchers, and sometimes companies, e.g. Yelp dataset:
https://www.yelp.co.uk/dataset

 + Quick, easy to get, data already compiled.
 – Not always available for the type of review we’re looking for.

30

2) AGENERnTE AWEB ADnTnSETS

 Collect data from review sites through their API:
 Text of the review.
 The review’s star rating can be used to determine if

positive/negative and build the classifier.

 + Very quick to get large collections.
 – Reviews have high commercial value, NOT always free.

31

3) ACROWDSOURCE ADnTnSET AnNNOTnTION

 Manual annotation/labelling of reviews can be achieved through
crowdsourcing:

 Define annotation work as a microtask (small, quick work).
 Define guidelines: how to decide the labels?
 Upload data to crowdsourcing platform and add money.
 Workers will find your microtasks and do annotations.

32

3) ACROWDSOURCE ADnTnSET AnNNOTnTION

 Crowdsourcing through e.g. crowdflower.com:

 + Quick way of collecting annotation.
 + High quality results.
 – Has a cost.

33

4) ADISTnNT ASUPERVISION

 Use positive and negative keywords to collect data, e.g.:
#happy, :), #sad, :(,…

 Build dataset after removing those keywords:

34

4) ADISTnNT ASUPERVISION

 + Easy and quick to get large collection of data.
 + Generally free, e.g. Twitter.
 – Data tends to be noisy, e.g. sarcasm.

35

BUILDING An ASUPERVISED ACLnSSIFIER

 Choose features to represent reviews as vectors.
 Use classifier for training.
 Evaluation.

 Cross-validation.
 Error analysis.

36

BUILDING An ASUPERVISED ACLnSSIFIER

 Choose features to represent reviews as vectors.
 Use classifier for training.
 Evaluation.

 Cross-validation.
 Error analysis.

See Lecture 7

37

CHOOSING ATHE AFEnTURES

 Need to be careful when we choose the features.
 Not necessarily same features as for other classification

tasks!

 Word embeddings are generally good to capture semantics.
(see lecture 6)

 Thinking of sentiment-specific features can further boost
performance.

38

CHOOSING ATHE AFEnTURES

 Features for sentiment analysis:
 Some punctuation may express sentiment and should be considered to

extract features, e.g. 🙂 or !!!

 Word lengthening and character repetition, e.g. noooooooo

 Capitalisation may express intensity, e.g. it was TERRIBLE

 Ellipses may be indicative of irony, e.g. seeing a rat in the restaurant was a
great experience, yeah, sure…

 …

OTHER CHALLENGES IN
SENTIMENT ANALYSIS

40

SENTIMENT ARnTING APREDICTION

 Predict 1-5 star rating of reviews, as opposed to +/-/neutral.
 Classification approaches haven’t performed very well here.
 1-5 ratings are not categorical, i.e.:

 4 and 5 are closer to each other than 1 and 5.
 A classifier doesn’t generally distinguish this.

 Regression algorithms have led to better performance.

41

nSPECT-BnSED ASENTIMENT AnNnLYSIS

 The food is great, but the service leaves much to be desired.

food service

 Usually tackled as a 2-step classification task:
 Aspect detection: what aspect does the sentence (or segment)

refer to?
 Sentiment classification: what is the sentiment associated with

that aspect?

We can break down the sentiment by aspect,
rather than giving an overall score.

42

nSPECT-BnSED ASENTIMENT AnNnLYSIS

 Build list of keywords associated with aspects.

 Reviews containing the keywords will be about the aspect in question.

 + Easy to implement.
 – Aspect name may not be mentioned in the text.

Service reception, Awliter, Awlitre嘣嘣,. . .

Locatoo cits Acenter, Altrlction嘣, Anelrbs,. . .

Cleaolioess hiou嘣ekeeper, Acyeln, A녰irts,. . .

Food fioio녰, Awine, A嘣nlck,. . .

43

nSPECT-BnSED ASENTIMENT AnNnLYSIS

 Supervised classification:
 Manually annotate small corpus of reviews with aspect.
 Train classifier to assign aspect to new reviews.

 Or find existing dataset, e.g.:

http://alt.qcri.org/semeval2016/task5/

44

nSPECT-BnSED ASENTIMENT AnNnLYSIS

 Building the entire pipeline:

Review嘣

Sentence嘣
& APhrl嘣e嘣

Sentence嘣
& APhrl嘣e嘣

Sentence嘣
& APhrl嘣e嘣

Text
Extractor

Sentiment
Classifer

Aspect
Extractor

Aggregator

45

CROSS-DOMnIN ASENTIMENT AnNnLYSIS

 Sentiment analysis is highly sensitive to domain of training data.
 Difficult to use labelled restaurant reviews to classify sentiment of

hotel reviews.
 Vocabulary, adjectives, language constructs can be different.

 Cross-domain sentiment analysis: training from data pertaining to

source domain to then classify data from a different target domain.

46

CROSS-DOMnIN ASENTIMENT AnNnLYSIS

 (Aue and Gamon, 2005) tested 4 approaches:

1. Train from reviews in a mix of domains, test on new domain.

2. Same as (1), but only using features occurring in new
domain.

3. Ensemble of classifiers for each domain in (1), test on new
domain.

4. Semi-supervised training from small amounts of data labelled
for the new domain.

 (4) worked best, showing importance of having at least a tiny
amount of data labelled for the target domain.

47

CROSS-DOMnIN ASENTIMENT AnNnLYSIS

 (Blitzer, Dredze and Pereira, 2007) proposed Structural
Correspondence Learning (SCL) for domain adaptation:

 Use common words in both domains as pivots, e.g.:
“service” is used both for hotels and for restaurants → pivot
“bed” is only used in hotels → non-pivot

 Build matrix of pivot words co-occurring with non-pivot words
within a window (e.g. 3-4 words).

 Singular Value Decomposition (SVD) is used to simplify the
matrix, which produces the cross-domain vectors.

48

CROSS-LINGUnL ASENTIMENT AnNnLYSIS

 Hotel reviews are written by visitors who speak different languages.
 We want to get a general picture of reviews from all visitors.

 If we only have English reviews labelled, can we leverage it to

classify reviews in other languages?

49

CROSS-LINGUnL ASENTIMENT AnNnLYSIS

 Different techniques:
 Machine translation, all reviews into 1 language (e.g. English).

+: extensible to many languages -: translations can be noisy, inaccurate

 Ensemble learning, train a classifier for each language, then
combine them.
+: easier to train each separate, monolingual classifier -: training data needed for all languages

 Transfer learning, train from source language, then find similar
patterns in target language, e.g. pivots.
+: no need for labelled data in target language -: patterns we find can be noisy

50

OTHER ACHnLLENGES

 Negation (sentiment shifters):
It’s not very good, but not as bad as I had heard either.

 Sarcasm:

Oh, yeah, I loved it, the food was cold and we were given dirty cutlery…

 Target:

I like football, but not cricket. → which one do they say they like?

51

OTHER ACHnLLENGES

 Comparative opinions:
I like both Android and iPhone, but I prefer the former.

 Co-reference resolution:

I have an Android and an iPhone, but I only like the former.

 Explicit (1) vs implicit (2) opinions:
(1): Android phones are great.
(2): The battery life of Android phones is longer than iPhones.

Indicative of intensity,
still both positive

Linking it to Android is
a challenge

52

RESOURCES ALEXICONS

 The General Inquirer: http://www.wjh.harvard.edu/~inquirer
Positive vs Negative, Strong vs Weak, Active vs Passive, Overstated vs Understated

 MPQA Lexicon: http://www.cs.pitt.edu/mpqa/subj_lexicon.html
Positive vs Negative, Intensity of Sentiment

 Bing Liu’s Lexicon: http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar
Positive vs Negative

 SentiWordNet: http://sentiwordnet.isti.cnr.it/
Degrees of positivity, negativity, and neutrality/objectiveness

53

RESOURCES ADnTnSETS

 Yelp dataset: https://www.yelp.co.uk/dataset

 Amazon reviews: https://www.kaggle.com/bittlingmayer/amazonreviews

 IMDB movie reviews: http://www.cs.cornell.edu/people/pabo/movie-review-data

 Twitter sentiment analysis: http://alt.qcri.org/semeval2017/task4/

54

REFERENCES

 Aue, Anthony and Michael Gamon. Customizing sentiment classifiers
to new domains: a case study. in Proceedings of Recent Advances in
Natural Language Processing (RANLP-2005). 2005.

 Blitzer, John, Mark Dredze, and Fernando Pereira. Biographies,
bollywood, boom-boxes and blenders: Domain adaptation for
sentiment classification. In Proceedings of Annual Meeting of the
Association for Computational Linguistics (ACL-2007). 2007.

55

nSSOCInTED AREnDING

 Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language
Processing: An Introduction to Natural Language Processing, Speech
Recognition, and Computational Linguistics. 3rd edition. Chapter 6.

 Bing Liu. Sentiment Analysis and Opinion Mining. Morgan & Claypool.
2012.
https://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-OpinionMining.pdf