lecture08.pptx
LECTURE 8
Sentiment Annlys嘣i嘣
nrklitz AZubilgl, A31
嘣t
AJlnulrs, A2018
2
Sentiment Alnlys嘣i嘣 A녰efnition Aln녰 Alppyicltion嘣.
Buiy녰ing Al A嘣entiment Acyl嘣嘣ifer
Un嘣upervi嘣e녰 A嘣entiment Acyl嘣嘣ifcltion.
Supervi嘣e녰 A嘣entiment Acyl嘣嘣ifcltion.
Other Achlyyenge嘣 Ain A嘣entiment Acyl嘣嘣ifcltion.
LECTURE A8 ACONTENTS
3
SENTIMENT AnNnLYSIS
Many names to refer to sentiment analysis:
Sentiment classification.
Opinion mining.
Opinion extraction.
Sentiment mining.
Subjectivity analysis.
4
SENTIMENT AnNnLYSIS
Given a text (e.g. a review) as input, is the text:
Positive?
Very comfortable hotel, reasonably priced, I’ll definitely be back.
Neutral?
We stayed in the hotel for one night.
Negative?
Terrible service, it was very noisy, we didn’t sleep. Never again.
5
SENTIMENT AnNnLYSIS
Depending on our needs, we can use finer granularity, e.g. 1-5 stars.
Terrible, don’t go there!
Quite bad, but at least the food was good.
Average hotel, could improve.
I did like it, just a bit pricey!
It was great, we’ll be back!
6
SENTIMENT AnNnLYSIS
Classify sentiment for different parts/sentences of a text.
7
nSPECT-BnSED ASENTIMENT AnNnLYSIS
Link those sentiments to different aspects.
Sentiment classification + aspect classification.
8
TnRGET-SPECIFIC ASENTIMENT AnNnLYSIS
What do people say about a particular product, brand, person?
Sentiment classification + target detection.
I’ve used Android and iPhone phones, and I hate the latter!
Negative, but only towards iPhone
9
SCHERER ATYPOLOGY AOF AnFFECTIVE ASTnTES
Emotion: angry, sad, joyful, fearful, ashamed, proud, elated
Mood: cheerful, gloomy, irritable, listless, depressed, buoyant
Interpersonal stances: friendly, flirtatious, distant, cold, warm,
supportive, contemptuous
Attitudes: liking, loving, hating, valuing, desiring
Personality traits: nervous, anxious, reckless, morose, hostile,
jealous
10
SENTIMENT AnNnLYSIS
Simplest task:
Is the attitude of text positive, neutral or negative?
More complex:
Rank attitude of text from 1 to 5 (diff. levels of intensity)
Advanced:
Detect the aspect, target, source, or complex attitude types
11
SENTIMENT AnNnLYSIS
Simplest task:
Is the attitude of text positive, neutral or negative?
More complex:
Rank attitude of text from 1 to 5 (diff. levels of intensity)
Advanced:
Detect the aspect, target, source, or complex attitude types
12
nGGREGnTING ASENTIMENT AnNnLYSIS
It’s particularly interesting when we aggregate sentiment from
different users:
Do people like this hotel?
How many people say it’s bad?
Which hotel is best liked?
Which hotel has the comfiest bed?
13
nPPLICnTIONS AOF ASENTIMENT AnNnLYSIS
http://www.sentiment140.com:
Look up sentiment of tweets towards products, brands,…
14
nPPLICnTIONS AOF ASENTIMENT AnNnLYSIS
Google Products (discontinued)
15
nPPLICnTIONS AOF ASENTIMENT AnNnLYSIS
Booking.com: aspect + positivity classification.
16
nPPLICnTIONS AOF ASENTIMENT AnNnLYSIS
TOTEMSS: http://elections.iti.gr/uk2017/?normalised=true
17
nPPLICnTIONS AOF ASENTIMENT AnNnLYSIS
Using sentiment of tweets for stock market prediction.
Ranco G, Aleksovski D, Caldarelli G, Grčar M, Mozetič I (2015) The Effects of Twitter Sentiment on Stock Price Returns. PLoS ONE 10(9): e0138441.
18
HOW ATO ABUILD An ASENTIMENT ACLnSSIFIER
Two main approaches:
Unsupervised classifier using lexicons.
Supervised classifier leveraging training data.
UNSUPERVISED CLASSIFIER
USING LEXICONS
20
CLnSSIFIER AUSING ALEXICONS
Using lexicons of positive and negative words, e.g.:
Count number of +/- words in a review:
Majority wins (are there more + or – words?).
Produce percentages (x% positive, y% negative).
21
CLnSSIFIER AUSING ALEXICONS
+ No need for training data (reviews annotated as +/-), just lexicons.
+ Quick and easy to implement.
– Can’t handle negations and other expressions, e.g.:
The restaurant is not bad at all, so we will be back!
I had heard that the service was slow, the food was terrible, and that it was
pricey… but hey, it’s actually awesome!
22
FREQUENCY AOF AWORDS AIN AREVIEWS
1-10 rating system.
23
NEGnTIONS
How frequent are negations for +/- reviews?
More frequent in negative reviews, but common in positive reviews.
24
WORKnROUND AFOR ANEGnTIONS
The restaurant is not bad at all, so we will be back!
Workaround: add “not_” to words after negation, up to next punctuation.
The restaurant is not not_bad not_at not_all, so we will be back!
Expand lexicons to incorporate “not_*” in opposite list, e.g.:
not_happy, not_good as negative
not_bad, not_sad as positive
25
WORKnROUND AFOR ANEGnTIONS
The restaurant is not not_bad not_at not_all, so we will be back!
It has shown to work well, but it has some problems too, e.g.:
I like the restaurant, not only the fantastic atmosphere but also the great service.
I like the restaurant, not not_only not_the not_fantastic not_atmosphere not_but not_also not_the not_great not_service.
Negating words don’t always imply negation.
? ?
26
LEnRNING ASENTIMENT ALEXICONS
Hatzivassiloglou and McKeown, 1997:
Build seed list manually (+: good, happy, -: bad, sad)
Use corpora to look for “and” and “but” between adjectives:
good and fun → good is positive, fun must be also positive.
happy but worried → happy is positive, worried must be
negative.
Turney, 2002:
Look for phrases co-occurring with seed list words: high
overlap with reviews containing “good” must be positive phrase
SUPERVISED CLASSIFIER
LEVERAGING TRAINING DATA
28
COLLECTION AOF ADnTnSETS
Ways of obtaining sentiment annotated datasets:
1) Look for existing datasets.
2) Scrape web datasets.
3) Crowdsource dataset annotation.
4) Distant supervision.
29
1) ALOOK AFOR AEXISTING ADnTnSETS
From researchers, and sometimes companies, e.g. Yelp dataset:
https://www.yelp.co.uk/dataset
+ Quick, easy to get, data already compiled.
– Not always available for the type of review we’re looking for.
30
2) AGENERnTE AWEB ADnTnSETS
Collect data from review sites through their API:
Text of the review.
The review’s star rating can be used to determine if
positive/negative and build the classifier.
+ Very quick to get large collections.
– Reviews have high commercial value, NOT always free.
31
3) ACROWDSOURCE ADnTnSET AnNNOTnTION
Manual annotation/labelling of reviews can be achieved through
crowdsourcing:
Define annotation work as a microtask (small, quick work).
Define guidelines: how to decide the labels?
Upload data to crowdsourcing platform and add money.
Workers will find your microtasks and do annotations.
32
3) ACROWDSOURCE ADnTnSET AnNNOTnTION
Crowdsourcing through e.g. crowdflower.com:
+ Quick way of collecting annotation.
+ High quality results.
– Has a cost.
33
4) ADISTnNT ASUPERVISION
Use positive and negative keywords to collect data, e.g.:
#happy, :), #sad, :(,…
Build dataset after removing those keywords:
34
4) ADISTnNT ASUPERVISION
+ Easy and quick to get large collection of data.
+ Generally free, e.g. Twitter.
– Data tends to be noisy, e.g. sarcasm.
35
BUILDING An ASUPERVISED ACLnSSIFIER
Choose features to represent reviews as vectors.
Use classifier for training.
Evaluation.
Cross-validation.
Error analysis.
36
BUILDING An ASUPERVISED ACLnSSIFIER
Choose features to represent reviews as vectors.
Use classifier for training.
Evaluation.
Cross-validation.
Error analysis.
See Lecture 7
37
CHOOSING ATHE AFEnTURES
Need to be careful when we choose the features.
Not necessarily same features as for other classification
tasks!
Word embeddings are generally good to capture semantics.
(see lecture 6)
Thinking of sentiment-specific features can further boost
performance.
38
CHOOSING ATHE AFEnTURES
Features for sentiment analysis:
Some punctuation may express sentiment and should be considered to
extract features, e.g. 🙂 or !!!
Word lengthening and character repetition, e.g. noooooooo
Capitalisation may express intensity, e.g. it was TERRIBLE
Ellipses may be indicative of irony, e.g. seeing a rat in the restaurant was a
great experience, yeah, sure…
…
OTHER CHALLENGES IN
SENTIMENT ANALYSIS
40
SENTIMENT ARnTING APREDICTION
Predict 1-5 star rating of reviews, as opposed to +/-/neutral.
Classification approaches haven’t performed very well here.
1-5 ratings are not categorical, i.e.:
4 and 5 are closer to each other than 1 and 5.
A classifier doesn’t generally distinguish this.
Regression algorithms have led to better performance.
41
nSPECT-BnSED ASENTIMENT AnNnLYSIS
The food is great, but the service leaves much to be desired.
food service
Usually tackled as a 2-step classification task:
Aspect detection: what aspect does the sentence (or segment)
refer to?
Sentiment classification: what is the sentiment associated with
that aspect?
We can break down the sentiment by aspect,
rather than giving an overall score.
42
nSPECT-BnSED ASENTIMENT AnNnLYSIS
Build list of keywords associated with aspects.
Reviews containing the keywords will be about the aspect in question.
+ Easy to implement.
– Aspect name may not be mentioned in the text.
Service reception, Awliter, Awlitre嘣嘣,. . .
Locatoo cits Acenter, Altrlction嘣, Anelrbs,. . .
Cleaolioess hiou嘣ekeeper, Acyeln, A녰irts,. . .
Food fioio녰, Awine, A嘣nlck,. . .
43
nSPECT-BnSED ASENTIMENT AnNnLYSIS
Supervised classification:
Manually annotate small corpus of reviews with aspect.
Train classifier to assign aspect to new reviews.
Or find existing dataset, e.g.:
http://alt.qcri.org/semeval2016/task5/
44
nSPECT-BnSED ASENTIMENT AnNnLYSIS
Building the entire pipeline:
Review嘣
Sentence嘣
& APhrl嘣e嘣
Sentence嘣
& APhrl嘣e嘣
Sentence嘣
& APhrl嘣e嘣
Text
Extractor
Sentiment
Classifer
Aspect
Extractor
Aggregator
45
CROSS-DOMnIN ASENTIMENT AnNnLYSIS
Sentiment analysis is highly sensitive to domain of training data.
Difficult to use labelled restaurant reviews to classify sentiment of
hotel reviews.
Vocabulary, adjectives, language constructs can be different.
Cross-domain sentiment analysis: training from data pertaining to
source domain to then classify data from a different target domain.
46
CROSS-DOMnIN ASENTIMENT AnNnLYSIS
(Aue and Gamon, 2005) tested 4 approaches:
1. Train from reviews in a mix of domains, test on new domain.
2. Same as (1), but only using features occurring in new
domain.
3. Ensemble of classifiers for each domain in (1), test on new
domain.
4. Semi-supervised training from small amounts of data labelled
for the new domain.
(4) worked best, showing importance of having at least a tiny
amount of data labelled for the target domain.
47
CROSS-DOMnIN ASENTIMENT AnNnLYSIS
(Blitzer, Dredze and Pereira, 2007) proposed Structural
Correspondence Learning (SCL) for domain adaptation:
Use common words in both domains as pivots, e.g.:
“service” is used both for hotels and for restaurants → pivot
“bed” is only used in hotels → non-pivot
Build matrix of pivot words co-occurring with non-pivot words
within a window (e.g. 3-4 words).
Singular Value Decomposition (SVD) is used to simplify the
matrix, which produces the cross-domain vectors.
48
CROSS-LINGUnL ASENTIMENT AnNnLYSIS
Hotel reviews are written by visitors who speak different languages.
We want to get a general picture of reviews from all visitors.
If we only have English reviews labelled, can we leverage it to
classify reviews in other languages?
49
CROSS-LINGUnL ASENTIMENT AnNnLYSIS
Different techniques:
Machine translation, all reviews into 1 language (e.g. English).
+: extensible to many languages -: translations can be noisy, inaccurate
Ensemble learning, train a classifier for each language, then
combine them.
+: easier to train each separate, monolingual classifier -: training data needed for all languages
Transfer learning, train from source language, then find similar
patterns in target language, e.g. pivots.
+: no need for labelled data in target language -: patterns we find can be noisy
50
OTHER ACHnLLENGES
Negation (sentiment shifters):
It’s not very good, but not as bad as I had heard either.
Sarcasm:
Oh, yeah, I loved it, the food was cold and we were given dirty cutlery…
Target:
I like football, but not cricket. → which one do they say they like?
51
OTHER ACHnLLENGES
Comparative opinions:
I like both Android and iPhone, but I prefer the former.
Co-reference resolution:
I have an Android and an iPhone, but I only like the former.
Explicit (1) vs implicit (2) opinions:
(1): Android phones are great.
(2): The battery life of Android phones is longer than iPhones.
Indicative of intensity,
still both positive
Linking it to Android is
a challenge
52
RESOURCES ALEXICONS
The General Inquirer: http://www.wjh.harvard.edu/~inquirer
Positive vs Negative, Strong vs Weak, Active vs Passive, Overstated vs Understated
MPQA Lexicon: http://www.cs.pitt.edu/mpqa/subj_lexicon.html
Positive vs Negative, Intensity of Sentiment
Bing Liu’s Lexicon: http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar
Positive vs Negative
SentiWordNet: http://sentiwordnet.isti.cnr.it/
Degrees of positivity, negativity, and neutrality/objectiveness
53
RESOURCES ADnTnSETS
Yelp dataset: https://www.yelp.co.uk/dataset
Amazon reviews: https://www.kaggle.com/bittlingmayer/amazonreviews
IMDB movie reviews: http://www.cs.cornell.edu/people/pabo/movie-review-data
Twitter sentiment analysis: http://alt.qcri.org/semeval2017/task4/
54
REFERENCES
Aue, Anthony and Michael Gamon. Customizing sentiment classifiers
to new domains: a case study. in Proceedings of Recent Advances in
Natural Language Processing (RANLP-2005). 2005.
Blitzer, John, Mark Dredze, and Fernando Pereira. Biographies,
bollywood, boom-boxes and blenders: Domain adaptation for
sentiment classification. In Proceedings of Annual Meeting of the
Association for Computational Linguistics (ACL-2007). 2007.
55
nSSOCInTED AREnDING
Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language
Processing: An Introduction to Natural Language Processing, Speech
Recognition, and Computational Linguistics. 3rd edition. Chapter 6.
Bing Liu. Sentiment Analysis and Opinion Mining. Morgan & Claypool.
2012.
https://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-OpinionMining.pdf