Biases in Pretrained Embeddings¶
In this notebook, we’ll attempt to build a sentiment classifier, first using pretrained word embeddings (GloVe) and then next using BERT (from Week 7’s workshop), and see if these pretrained embeddings/models inherently contain any biases.
What is GloVe embeddings? They are word embeddings like Word2Vec, but implemented differently. If you’re interested to read more about GloVe, you can find more information here. The crucial thing to note here is that GloVe embeddings are trained on Google News and Common Crawl web data, and so the embeddings themselves are likely to capture common stereotypes and biases in our culture.
First, let’s upload the GloVe embeddings (“13-glove.6B.50d.txt”) to your colab instance.
Refresher:
1. To upload files, click the folder icon on the left, and click the “upload” icon to choose files from your local drive (you can also drag and drop files to upload them). Once the files are uploaded, you should see them appearing in the file system.
2. Don’t forget to enable GPU on the colab notebook. We can do this by going to “Runtime $>$ Change runtime type” and selecting “GPU” as the hardware accelerator. Click save.
Now let’s get started. Let’s load some libraries that we’ll be using for building the first sentiment classifier using the GloVe embeddings.
In [1]:
import numpy as np
import pandas as pd
import matplotlib
import re
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
Next, we’ll define a function to load the embeddings using the text file. The format is pretty self-explanatory if you view the embeddings file: it’s just one line for each word.
Note: the loading process might take a couple seconds.
In [2]:
def load_embeddings(filename):
“””
Load a DataFrame from the generalized text format used by word2vec, GloVe,
fastText, and ConceptNet Numberbatch. The main point where they differ is
whether there is an initial line with the dimensions of the matrix.
“””
labels = []
rows = []
with open(filename, encoding=’utf-8′) as infile:
for i, line in enumerate(infile):
items = line.rstrip().split(‘ ‘)
if len(items) == 2:
# This is a header row giving the shape of the matrix
continue
labels.append(items[0])
values = np.array([float(x) for x in items[1:]], ‘f’)
rows.append(values)
arr = np.vstack(rows)
return pd.DataFrame(arr, index=labels, dtype=’f’)
embeddings = load_embeddings(’13-glove.6B.50d.txt’)
print(embeddings.shape)
(400000, 50)
One way to build a sentiment classifier is to use a sentiment lexicon: a dictionary that contains positive and negative words. There are many sentiment lexicons you could use, but we’ll be using the Opinion Lexicon. Download them (13-positive-words.txt and 13-negative-words.txt from Canvas) and put them in the same directory of this notebook. Once that’s done, we’ll load the lexicon using the function defined below.
In [3]:
def load_lexicon(filename):
“””
Load a file from Bing Liu’s sentiment lexicon
(https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html), containing
English words in Latin-1 encoding.
One file contains a list of positive words, and the other contains
a list of negative words. The files contain comment lines starting
with ‘;’ and blank lines, which should be skipped.
“””
lexicon = []
with open(filename, encoding=’latin-1′) as infile:
for line in infile:
line = line.rstrip()
if line and not line.startswith(‘;’):
lexicon.append(line)
return lexicon
pos_words = load_lexicon(’13-positive-words.txt’)
neg_words = load_lexicon(’13-negative-words.txt’)
print(pos_words[:30])
print(neg_words[:30])
[‘a+’, ‘abound’, ‘abounds’, ‘abundance’, ‘abundant’, ‘accessable’, ‘accessible’, ‘acclaim’, ‘acclaimed’, ‘acclamation’, ‘accolade’, ‘accolades’, ‘accommodative’, ‘accomodative’, ‘accomplish’, ‘accomplished’, ‘accomplishment’, ‘accomplishments’, ‘accurate’, ‘accurately’, ‘achievable’, ‘achievement’, ‘achievements’, ‘achievible’, ‘acumen’, ‘adaptable’, ‘adaptive’, ‘adequate’, ‘adjustable’, ‘admirable’]
[‘2-faced’, ‘2-faces’, ‘abnormal’, ‘abolish’, ‘abominable’, ‘abominably’, ‘abominate’, ‘abomination’, ‘abort’, ‘aborted’, ‘aborts’, ‘abrade’, ‘abrasive’, ‘abrupt’, ‘abruptly’, ‘abscond’, ‘absence’, ‘absent-minded’, ‘absentee’, ‘absurd’, ‘absurdity’, ‘absurdly’, ‘absurdness’, ‘abuse’, ‘abused’, ‘abuses’, ‘abusive’, ‘abysmal’, ‘abysmally’, ‘abyss’]
So how do we build a sentiment classifier using word embeddings? We’ll take a very simple approach, where we embed the positive and negative words using GloVe embeddings, and then train a logistic regression model that predict their sentiments based on their embeddings. Once trained, we can then apply the logistic regression to all other words not part of the lexicon that have a corresponding GloVe embeddings to predict their sentiments.
Note: index.intersection() is used to only words that are in the GloVe vocabulary
In [4]:
pos_vectors = embeddings.loc[embeddings.index.intersection(pos_words)]
neg_vectors = embeddings.loc[embeddings.index.intersection(neg_words)]
pos_vectors.sort_index(inplace = True)
neg_vectors.sort_index(inplace = True)
pos_vectors
Out[4]:
0
1
2
3
4
5
6
7
8
9
…
40
41
42
43
44
45
46
47
48
49
abound
0.680940
0.681610
-0.598430
0.452900
-0.695230
0.024194
-0.281520
-0.647090
-0.287200
0.339580
…
-0.272300
0.341360
0.526700
1.151500
0.907630
0.560140
0.360690
0.685340
0.223100
-0.926680
abounds
0.344170
0.401890
-0.801010
0.605540
-0.364730
-0.517050
0.318030
-0.310820
0.155790
1.076700
…
0.196530
0.335710
-0.115510
0.509530
0.152230
0.601010
0.450700
1.021500
0.450880
-0.938780
abundance
0.135060
1.014300
-0.550050
0.134170
0.714800
-0.061255
-0.110980
-0.482330
0.960090
0.797050
…
-0.160630
0.026046
-0.084491
0.280180
0.251620
0.048608
-0.006860
-0.006417
0.094202
-0.321800
abundant
0.372980
0.772570
-0.497300
0.272560
0.433190
-0.393870
-0.430560
-0.174180
1.267900
0.874630
…
-0.368700
0.025354
-0.152660
0.238080
0.404190
0.191850
-0.503270
0.048975
-0.080421
-0.523340
accessible
0.746860
0.710430
0.444680
-0.250280
-0.401760
-0.555660
-0.878610
-0.487020
0.800530
0.420210
…
0.641040
-0.499470
0.821910
0.969470
-0.729920
-0.031303
-0.210450
-0.544280
0.557920
-0.110500
acclaim
-0.592170
0.527210
-0.231590
-0.582410
-0.376310
-0.012175
-0.427890
0.071798
0.424590
2.103400
…
0.544190
0.077834
-1.140700
-0.617610
-1.097100
-0.010793
-0.619540
0.111450
-0.446600
-0.258610
acclaimed
-0.285720
0.444580
-0.745500
-0.376100
-0.317760
0.290250
-0.959750
-0.556490
-0.195140
1.372700
…
-0.125860
0.518090
-0.321730
-1.057400
-0.261180
0.038646
-0.569100
-0.488770
-0.472750
0.309320
acclamation
-0.931830
-0.040727
-0.132430
-1.023900
-0.256720
-0.207960
0.719530
0.150080
-0.448280
0.258540
…
-0.269530
0.375180
-1.035900
-0.310760
-1.511000
0.348370
-0.759630
0.079894
0.102110
-0.043891
accolade
-0.784170
0.515100
-0.532170
-0.119030
0.540050
-0.510090
0.407460
0.376070
0.865570
1.666200
…
0.298780
-0.107320
-0.731490
0.473890
-1.216300
-0.187210
-0.152010
0.909840
-0.075292
-0.298360
accolades
-1.118100
0.936520
-0.171850
-0.271550
-0.551700
-0.308900
-0.222280
0.783690
0.341970
1.161900
…
0.140670
0.279610
-0.762770
0.166700
-0.900690
-0.570040
-0.512730
1.218800
-0.715560
-0.587760
accommodative
0.142690
0.478190
-0.263490
-2.255300
-0.415190
-1.030400
1.526000
0.052097
-0.570440
-0.074615
…
-0.027751
-1.745400
0.591680
1.322500
0.017570
-1.053000
0.328850
1.706000
0.567740
0.740830
accomodative
-0.159410
-0.219540
-1.061200
-1.759900
-0.579050
-1.234400
1.249200
0.245430
-0.547830
0.093014
…
-0.109370
-0.819240
0.928060
0.681710
-0.763470
-0.694190
0.344660
1.028400
0.144890
0.792710
accomplish
0.637390
-0.272200
0.242420
-0.392480
0.611170
-0.233080
0.214820
0.281210
0.044966
0.324320
…
-0.826370
-0.992620
0.066650
0.184300
-0.624190
0.049727
0.400690
0.704640
-0.334690
0.895320
accomplished
-0.343680
0.130460
-0.265460
-0.573130
0.470830
0.090571
-0.564550
0.474320
-0.422350
0.681910
…
-0.285580
-0.258390
-0.033096
0.074728
-0.340410
-0.503850
-0.069134
0.124020
-0.678190
0.812130
accomplishment
-0.309950
0.848120
-0.618860
0.086333
0.930960
-0.022791
0.254460
0.658040
0.398090
0.782020
…
-0.861410
-0.606870
-0.945150
0.062885
-0.955460
0.076932
-0.143450
1.073900
-0.292720
0.343090
accomplishments
-0.744990
1.217200
-0.367500
-0.194370
0.193700
-0.080197
-0.385500
0.094538
0.169260
0.260800
…
-0.931110
-0.397700
-0.714830
0.290680
-0.736050
-0.262070
-0.276130
0.933830
-0.475750
-0.136320
accurate
0.110420
-0.794960
0.640880
-0.428650
0.808670
-0.276890
0.114350
-0.356810
0.724150
0.476020
…
0.582300
-0.035198
0.278290
0.499780
-0.567040
-0.382330
0.709740
1.624500
0.973590
0.608770
accurately
0.460030
-0.955770
1.083700
-0.672890
0.681280
-0.081830
0.517630
-0.177270
0.211550
-0.049742
…
-0.050443
-0.312260
0.481920
0.156350
-0.339780
-0.517530
0.779910
0.814850
0.528920
0.327770
achievable
0.748450
-0.425380
0.896620
-1.137100
0.858280
-0.381090
0.815290
-0.888270
0.361790
1.230600
…
-0.359270
-0.645960
0.445690
-0.202140
-1.370600
-0.226760
0.573670
1.253200
0.687960
0.977740
achievement
-0.937190
1.547000
-0.537040
-0.372400
0.640320
0.024726
0.287790
-0.376550
0.858010
0.776500
…
-0.766860
-0.500740
-0.487560
-0.473660
-1.008000
-0.191870
0.304330
0.975660
-0.224780
0.271330
achievements
-0.519330
1.366800
-0.687400
0.095220
0.186070
-0.333860
-0.050129
-0.402980
0.333730
0.652250
…
-0.977470
-0.649180
-0.282470
-0.349090
-0.505080
0.019245
-0.496300
1.054500
-0.194040
-0.709510
acumen
-0.230750
-0.179580
-0.404290
-0.029111
-0.102500
-0.313560
0.363180
0.073920
-0.056340
0.627770
…
0.329710
-0.234590
-0.824520
1.226400
-0.415110
-0.709680
0.244940
1.059100
-0.168230
0.373200
adaptable
0.740600
-0.967460
0.192200
-0.512930
0.194910
-0.109880
0.449280
-0.744890
-0.172240
0.389840
…
0.066087
-0.382870
0.401890
1.087200
1.033700
0.130920
0.020086
0.381350
-0.606350
1.246700
adaptive
0.810480
0.020273
0.018928
0.740840
-0.834880
0.145300
0.973240
-1.660900
0.283130
0.704280
…
0.351570
-0.176990
-0.586910
1.517000
-0.181400
-0.625510
1.179500
0.268520
0.480960
0.716960
adequate
0.297050
-0.192560
0.140930
-0.632550
0.012045
-0.464230
0.630200
0.325410
1.509100
0.219310
…
0.281690
-0.393190
0.599050
0.610890
-0.475770
-0.162280
-0.483220
1.378600
0.526970
0.733280
adjustable
0.057434
-0.038889
1.441700
-1.097500
-0.357510
1.350700
0.670450
-0.685100
-0.602070
0.367160
…
-0.643070
0.778840
-0.422750
1.624400
-0.820810
-0.914660
1.589200
0.139630
0.432760
-0.680860
admirable
-0.127900
-0.090377
-0.723990
-0.791190
1.052100
0.181040
0.639720
-0.308440
0.231720
0.895850
…
0.230230
-0.276790
-0.090644
0.094873
-0.453070
-0.066544
-0.082008
0.526900
0.146630
0.699000
admirably
0.376790
-1.148200
-0.940230
-1.214000
0.162860
0.195160
0.444170
0.486970
-0.259090
0.625060
…
-0.196870
-0.423700
-0.408390
0.131640
-0.399630
-0.001001
-0.104410
-0.142980
-0.547960
0.754670
admiration
-0.153550
1.400100
-0.562520
-0.674800
0.766420
0.188550
0.387360
0.582840
-0.755210
1.265200
…
0.905440
0.063275
-0.768660
0.052548
0.117720
-0.366380
-0.845940
0.319340
-0.905260
-0.587110
admire
0.045975
0.456580
0.114950
-0.819460
0.847230
-0.584460
0.175900
0.096512
-0.708970
0.685690
…
0.114870
0.209020
0.054165
0.217130
0.215080
-0.007071
-0.341700
-0.318710
-0.681040
0.169950
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
wisdom
-0.117550
0.637350
-0.532240
-0.609040
1.283700
-0.087655
0.482680
-0.365990
0.205640
0.614630
…
0.270910
-0.286990
-0.384710
1.244600
-0.056028
0.706640
0.246510
0.448580
-0.481630
-0.519550
wise
-0.198430
0.262610
-0.314930
-0.705050
1.085300
-0.434220
-0.173200
-0.127520
-0.498940
0.138880
…
0.098422
0.006300
-0.038803
0.308120
0.026377
0.098100
-0.139110
-0.014816
0.441250
1.481800
wisely
0.236190
-1.085600
0.165910
-0.980650
0.430790
-0.345240
-0.034623
0.018771
-0.325040
0.402230
…
-0.303710
-0.014925
-0.116600
0.424730
-0.955570
-0.225760
-0.331410
-0.099497
-0.456060
0.693280
witty
-0.386800
0.373000
-1.020500
0.000054
0.496050
0.679690
0.297560
-0.558480
-0.842530
0.823090
…
1.363900
0.634950
0.147240
0.290440
0.661010
-0.192140
0.422710
0.215890
0.349530
1.012800
won
-1.556100
0.862410
0.146040
1.138900
-0.168750
0.689050
-0.641420
0.406180
-0.887950
0.065865
…
-0.111360
-0.082122
-0.043902
-0.527590
-0.730330
0.202930
-0.824480
-0.605930
-0.061551
-0.488980
wonder
0.002993
0.273280
0.299040
-0.448740
0.625400
-0.116580
-0.744220
0.331040
-0.198380
0.707360
…
-0.411170
0.108370
-0.487210
0.221040
0.304720
0.515760
0.026934
-0.339080
-0.515170
0.394230
wonderful
0.235330
0.913200
-1.200800
0.006560
1.284300
-0.104950
-0.469280
-0.160640
-0.002305
0.882190
…
0.204910
0.095888
-0.396820
-0.187090
0.219040
0.576940
0.066142
-0.161620
-0.024670
0.846260
wonderfully
0.635230
-0.430960
-1.242100
-0.475730
0.717270
0.092392
0.204200
-0.214020
-0.427690
1.257000
…
0.340470
0.336160
-0.089202
-0.027039
0.006851
0.368900
0.217150
-0.211800
-0.121790
0.851340
wonders
0.163040
0.569870
-0.050685
0.107900
0.537940
-0.334660
-0.627000
0.090018
0.171250
0.341670
…
-0.452470
-0.114100
-0.098827
0.314720
-0.054529
0.552650
0.186910
-0.106790
-0.294150
0.129970
wondrous
0.776470
0.488100
-0.938690
0.156170
0.506070
-0.664550
0.317310
-0.084731
0.339480
1.135600
…
-0.314790
-0.047893
-0.053442
0.287780
-0.415410
0.776940
0.705660
-0.292090
0.106880
-0.131570
woo
0.209310
0.226970
1.148400
0.010467
-0.684760
-0.144020
0.042576
0.029019
-0.646400
0.858580
…
-0.403070
0.632570
0.116870
-0.218940
1.562700
-1.300100
-0.180990
-0.436060
0.235200
0.549410
work
0.513590
0.196950
-0.519440
-0.862180
0.015494
0.109730
-0.802930
-0.333610
-0.000161
0.010189
…
-0.159150
-0.304380
0.160250
-0.182900
-0.038563
-0.176190
0.027041
0.046842
-0.628970
0.357260
workable
0.971330
-0.939170
0.158100
-0.092548
0.031769
0.995560
0.146000
-0.522460
-0.314470
0.454140
…
-0.072538
-1.042900
0.316410
0.382360
-0.612450
0.317240
0.348660
1.518300
0.532910
0.547180
worked
-0.050169
-0.219080
-0.302020
-0.595900
-0.024815
0.049210
-1.690100
0.059931
-0.434210
-0.324250
…
-0.106040
0.143810
-0.061076
-0.214920
0.015389
-0.336180
-0.411700
-0.576980
-0.647930
0.699500
works
0.562660
0.590510
-0.728210
-0.659680
0.080400
0.412020
-0.821940
-0.832490
-0.366390
0.601040
…
0.546250
-0.047358
0.225150
-0.355740
-0.057102
0.326410
-0.265200
-0.227130
-0.270250
0.131650
world-famous
-0.000181
0.705080
-1.001900
0.214820
-0.629170
-0.073151
-0.597170
0.484850
0.038725
0.558310
…
-0.031396
-0.203950
-0.672850
0.370890
-0.022468
-0.368070
-0.246010
-0.418890
0.009704
-0.603020
worth
0.404040
0.387850
0.533970
0.050586
1.059200
-0.197710
-0.737480
-0.964010
0.104160
0.643090
…
0.112650
0.018070
1.131600
0.384960
-0.742530
0.087778
-0.711340
0.847190
-0.557230
0.300220
worthiness
0.036126
0.371510
0.051259
-0.491650
-0.802410
-0.047907
0.449370
0.513280
1.646700
0.117020
…
0.014283
-0.429740
0.813310
0.862590
-0.921880
-0.367720
0.463110
1.107800
0.756580
-0.456670
worthwhile
1.135300
0.033116
-0.224030
-0.295590
0.255210
0.002701
0.014768
0.195690
0.594270
1.161200
…
-0.257890
-0.834040
-0.051418
0.705190
-0.140910
-0.245680
-0.044524
1.161500
0.190320
1.214500
worthy
0.307940
0.944360
-0.463820
-0.205540
0.833860
0.261410
0.243570
0.355880
0.134050
0.958150
…
0.169690
-0.178250
-0.358940
0.508950
-0.331540
0.206300
-0.749030
0.272470
0.249300
0.562940
wow
-0.363620
0.267590
0.576300
-0.377840
0.190680
-1.199800
0.213720
-0.074527
-0.062496
1.102700
…
-0.033142
-0.583550
-0.728800
-0.470480
0.535540
-0.134690
0.263770
-0.175680
-0.084818
1.115000
wowed
-0.499150
-0.868430
0.285900
-0.747580
0.081121
-1.083300
-0.347260
0.145500
-0.660840
0.953610
…
0.110540
0.485320
-0.460860
0.265750
0.474920
-0.178660
-0.236320
-0.216050
-0.619110
0.004770
wowing
0.341400
-0.856290
0.146980
-1.193500
-0.072555
-0.667440
0.306280
0.394040
-0.185790
1.232600
…
0.151700
0.280340
-0.246630
0.735830
0.513840
-0.066444
0.191680
0.095857
-0.711380
-0.466050
wows
-0.271800
0.155680
0.469190
-0.688140
-0.025299
-1.181800
0.552030
0.226460
-0.642360
0.437320
…
0.436350
0.173930
0.107700
0.194060
0.539290
-0.834520
0.510680
0.269550
-1.039900
0.205400
yay
-0.138020
0.822920
1.449100
-2.031600
-0.970630
-1.893900
0.766980
-0.881990
0.550140
1.321500
…
0.015258
2.108600
0.309790
-2.679000
0.208180
0.306560
-0.249580
-1.835000
1.011000
0.336040
youthful
-0.066812
0.576050
-0.173320
-0.948570
0.707060
0.417810
0.212500
-0.236360
-0.654950
1.599600
…
-0.105450
0.061121
-0.535190
0.206480
0.465120
-0.115180
-0.127490
-0.375420
0.066607
-0.142230
zeal
-0.122570
-0.566770
-0.662470
-1.667400
0.542580
0.004206
0.511770
-0.152310
0.282440
1.104500
…
0.172280
-0.260910
-0.890120
0.849820
0.060605
-0.132610
-0.289160
0.552470
0.023004
-0.378880
zenith
0.119250
0.384110
0.808770
0.539840
-0.592570
-0.456540
0.073042
-0.270610
0.041110
0.462640
…
0.821540
-0.261640
-0.440520
0.506450
-1.026300
-0.219950
0.438900
-0.098213
-0.839860
-0.163000
zest
0.167720
-0.183480
-0.832810
0.237150
0.720410
0.760570
0.714570
0.223430
-0.360100
0.651780
…
1.312400
-0.489870
-0.815220
-0.475070
-0.279160
1.503500
0.102270
0.090781
-0.087266
-0.076426
zippy
0.333870
-0.830000
-0.008908
-0.047370
-0.014258
0.020882
0.130150
-0.521620
-0.054453
0.869590
…
0.541260
0.260410
-0.919060
0.128200
-0.329930
-0.176470
0.941670
0.356900
0.528820
0.706110
1893 rows × 50 columns
Now we make arrays of the desired inputs and outputs. The inputs are the embeddings, and the outputs are 1 for positive words and -1 for negative words. We also make sure to keep track of the words they’re labeled with, so we can interpret the results.
In [5]:
vectors = pd.concat([pos_vectors, neg_vectors])
targets = np.array([1 for entry in pos_vectors.index] + [-1 for entry in neg_vectors.index])
labels = list(pos_vectors.index) + list(neg_vectors.index)
Next we split the input vectors, output values, and labels into training and test data, with 10% of the data used for testing.
In [6]:
train_vectors, test_vectors, train_targets, test_targets, train_labels, test_labels = \
train_test_split(vectors, targets, labels, test_size=0.1, random_state=2)
Now we make our classifier, and train it by running the training vectors through it for 100 iterations. We use a logistic function as the loss as we’re building a logistic regression model. The resulting classifier should output the probability that a word is positive or negative.
In [7]:
model = SGDClassifier(loss=’log’, random_state=0, max_iter=100)
model.fit(train_vectors, train_targets)
Out[7]:
SGDClassifier(alpha=0.0001, average=False, class_weight=None,
early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True,
l1_ratio=0.15, learning_rate=’optimal’, loss=’log’, max_iter=100,
n_iter_no_change=5, n_jobs=None, penalty=’l2′, power_t=0.5,
random_state=0, shuffle=True, tol=0.001, validation_fraction=0.1,
verbose=0, warm_start=False)
We evaluate the classifier on the test vectors. It predicts the correct sentiment for sentiment words outside of its training data with around 88% accuracy. Not bad.
In [8]:
accuracy_score(model.predict(test_vectors), test_targets)
Out[8]:
0.8830128205128205
Let’s define a function that we can use to see the sentiment that this classifier predicts for particular words, then use it to see some examples of its predictions on the test data.
In [9]:
def vecs_to_sentiment(vecs):
# predict_log_proba gives the log probability for each class
predictions = model.predict_log_proba(vecs)
# To see an overall positive vs. negative classification in one number,
# we take the log probability of positive sentiment minus the log
# probability of negative sentiment.
return predictions[:, 1] – predictions[:, 0]
def words_to_sentiment(words):
vecs = embeddings.loc[embeddings.index.intersection(words)]
log_odds = vecs_to_sentiment(vecs)
return pd.DataFrame({‘sentiment’: log_odds}, index=vecs.index)
# Show 20 examples from the test set
words_to_sentiment(test_labels).head(20)
Out[9]:
sentiment
well
2.953970
killed
-8.719976
peace
3.983100
attacks
-7.998912
clear
-0.366808
fall
-2.693952
popular
1.839639
champion
3.299406
bomb
-6.058093
broke
-5.587792
protect
-1.050680
critics
-1.262191
lose
-0.347604
terror
-4.650521
approval
2.441816
offensive
-5.519903
plot
-3.319520
stability
2.283495
urgent
-1.894337
abuse
-8.000703
More than the accuracy number, this convinces us that the classifier is working. We can see that the classifier has learned to generalize sentiment to words outside of its training data. Note that the returned sentiment here is the result of logprob(positive_class) – logprob(negative_class), and a positive value indicates positive sentiment and negative value indicates negative sentiment.
Now that we’re reasonably happy with the classifier, we’ll extend it to classify sentiment for a sentence. We can do so by simply computing the sentiment for each word in the sentence, and then taking the mean sentiment over all words.
In [10]:
import re
TOKEN_RE = re.compile(r”\w.*?\b”)
# The regex above finds tokens that start with a word-like character (\w), and continues
# matching characters (.+?) until the next word break (\b). It’s a relatively simple
# expression that manages to extract something very much like words from text.
def text_to_sentiment(text):
tokens = [token.casefold() for token in TOKEN_RE.findall(text)]
sentiments = words_to_sentiment(tokens)
return sentiments[‘sentiment’].mean()
Now let’s test on some example sentences
In [11]:
print(text_to_sentiment(“this example is pretty cool”))
print(text_to_sentiment(“this example is okay”))
print(text_to_sentiment(“meh, this example sucks”))
1.0584039704186323
1.3265787691462831
-0.3028293733994394
The results look pretty reasonable. Let’s try more examples.
In [12]:
print(text_to_sentiment(“Let’s go get Italian food”))
print(text_to_sentiment(“Let’s go get Chinese food”))
print(text_to_sentiment(“Let’s go get Mexican food”))
0.511555225196519
0.45009422205976213
-0.17230140715207987
Interesting. Mexican food seem to be associated with a negative sentiment. Let’s try some names.
Note: there will be randomness in terms of the output. Do not be alarmed if you see different results.
In [40]:
print(text_to_sentiment(“My name is Emily”))
print(text_to_sentiment(“My name is Heather”))
print(text_to_sentiment(“My name is Yvette”))
print(text_to_sentiment(“My name is Yasin”))
0.9727873384125275
1.01529061284594
1.0788585785392906
-0.544038159419399
Looks like the system has widely different sentiments with people’s names. This is a little worrying. Did we just build a racist sentiment classifier?
Note: there will be randomness in terms of the output. Do not be alarmed if you see different results.
Let’s measure this bias with a bit more rigour.
Below we have four lists of names that tend to reflect different ethnic backgrounds. The first two are lists of predominantly “white” and “black” names adapted from this paper. We’ve also added typically Hispanic names, as well as Muslim names that come from Arabic or Urdu.
In [46]:
NAMES_BY_ETHNICITY = {
# The first two lists are from the Caliskan et al. appendix describing the
# Word Embedding Association Test.
‘White’: [
‘Adam’, ‘Chip’, ‘Harry’, ‘Josh’, ‘Roger’, ‘Alan’, ‘Frank’, ‘Ian’, ‘Justin’,
‘Ryan’, ‘Andrew’, ‘Fred’, ‘Jack’, ‘Matthew’, ‘Stephen’, ‘Brad’, ‘Greg’, ‘Jed’,
‘Paul’, ‘Todd’, ‘Brandon’, ‘Hank’, ‘Jonathan’, ‘Peter’, ‘Wilbur’, ‘Amanda’,
‘Courtney’, ‘Heather’, ‘Melanie’, ‘Sara’, ‘Amber’, ‘Crystal’, ‘Katie’,
‘Meredith’, ‘Shannon’, ‘Betsy’, ‘Donna’, ‘Kristin’, ‘Nancy’, ‘Stephanie’,
‘Bobbie-Sue’, ‘Ellen’, ‘Lauren’, ‘Peggy’, ‘Sue-Ellen’, ‘Colleen’, ‘Emily’,
‘Megan’, ‘Rachel’, ‘Wendy’
],
‘Black’: [
‘Alonzo’, ‘Jamel’, ‘Lerone’, ‘Percell’, ‘Theo’, ‘Alphonse’, ‘Jerome’,
‘Leroy’, ‘Rasaan’, ‘Torrance’, ‘Darnell’, ‘Lamar’, ‘Lionel’, ‘Rashaun’,
‘Tyree’, ‘Deion’, ‘Lamont’, ‘Malik’, ‘Terrence’, ‘Tyrone’, ‘Everol’,
‘Lavon’, ‘Marcellus’, ‘Terryl’, ‘Wardell’, ‘Aiesha’, ‘Lashelle’, ‘Nichelle’,
‘Shereen’, ‘Temeka’, ‘Ebony’, ‘Latisha’, ‘Shaniqua’, ‘Tameisha’, ‘Teretha’,
‘Jasmine’, ‘Latonya’, ‘Shanise’, ‘Tanisha’, ‘Tia’, ‘Lakisha’, ‘Latoya’,
‘Sharise’, ‘Tashika’, ‘Yolanda’, ‘Lashandra’, ‘Malika’, ‘Shavonn’,
‘Tawanda’, ‘Yvette’
],
# This list comes from statistics about common Hispanic-origin names in the US.
‘Hispanic’: [
‘Juan’, ‘José’, ‘Miguel’, ‘Luís’, ‘Jorge’, ‘Santiago’, ‘Matías’, ‘Sebastián’,
‘Mateo’, ‘Nicolás’, ‘Alejandro’, ‘Samuel’, ‘Diego’, ‘Daniel’, ‘Tomás’,
‘Juana’, ‘Ana’, ‘Luisa’, ‘María’, ‘Elena’, ‘Sofía’, ‘Isabella’, ‘Valentina’,
‘Camila’, ‘Valeria’, ‘Ximena’, ‘Luciana’, ‘Mariana’, ‘Victoria’, ‘Martina’
],
# This list is compiled from baby-name sites for common Muslim names,
# as spelled in English.
# Note: the following list potentially conflates religion and ethnicity and so it isn’t
# perfect.
‘Arab/Muslim’: [
‘Mohammed’, ‘Omar’, ‘Ahmed’, ‘Ali’, ‘Youssef’, ‘Abdullah’, ‘Yasin’, ‘Hamza’,
‘Ayaan’, ‘Syed’, ‘Rishaan’, ‘Samar’, ‘Ahmad’, ‘Zikri’, ‘Rayyan’, ‘Mariam’,
‘Jana’, ‘Malak’, ‘Salma’, ‘Nour’, ‘Lian’, ‘Fatima’, ‘Ayesha’, ‘Zahra’, ‘Sana’,
‘Zara’, ‘Alya’, ‘Shaista’, ‘Zoya’, ‘Yasmin’
],
# This list uses some of the most common Chinese given names
# (https://en.wikipedia.org/wiki/Chinese_given_name)
‘Chinese’: [
‘Wei’, ‘Fang’, ‘Xiu Ying’, ‘Na’, ‘Min’, ‘Jing’, ‘Li’, ‘Qiang’, ‘Lei’,
‘Yang’, ‘Jie’, ‘Jun’, ‘Yong’, ‘Yan’, “Chao”, “Tao”, “Juan”, “Han”
]
}
In [42]:
def name_sentiment_table():
frames = []
for group, name_list in sorted(NAMES_BY_ETHNICITY.items()):
lower_names = [name.lower() for name in name_list]
sentiments = words_to_sentiment(lower_names)
sentiments[‘group’] = group
frames.append(sentiments)
# Put together the data we got from each ethnic group into one big table
return pd.concat(frames)
name_sentiments = name_sentiment_table()
In [43]:
name_sentiments.iloc[::25]
Out[43]:
sentiment
group
ali
0.900462
Arab/Muslim
ayaan
-3.833503
Arab/Muslim
malika
-2.147892
Black
chao
-2.863590
Chinese
luisa
0.128827
Hispanic
greg
0.413699
White
amber
0.351407
White
In [45]:
%matplotlib inline
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
#TODO: code below could possibly be simplified here
white_sentiments = name_sentiments[name_sentiments[“group”] == “White”][“sentiment”].tolist()
black_sentiments = name_sentiments[name_sentiments[“group”] == “Black”][“sentiment”].tolist()
hispanic_sentiments = name_sentiments[name_sentiments[“group”] == “Hispanic”][“sentiment”].tolist()
arab_sentiments = name_sentiments[name_sentiments[“group”] == “Arab/Muslim”][“sentiment”].tolist()
chinese_sentiments = name_sentiments[name_sentiments[“group”] == “Chinese”][“sentiment”].tolist()
x = name_sentiments[“sentiment”].tolist()
y= name_sentiments[“group”].tolist()
plt.boxplot([white_sentiments, black_sentiments, hispanic_sentiments, arab_sentiments, chinese_sentiments],
labels = [“White”, “Black”, “Hispanic”, “Arab/Muslim”, “Chinese”])
plt.show()

Looking at the mean sentiment over these different ethnic groups, it’s pretty clear that Black names are on average associated with negative sentiment, and so are Arab/Muslim and Chinese names (although not quite as negative).
Why is this happening? Where is the source of bias? It’s not from the sentiment lexicon, because it doesn’t include any names. The source of bias comes from the pretrained GloVe word embeddings, which are trained using news and web data. As the data encodes biases and stereotypes that reflect our worldview, the sentiment classifier we built ultimately reflects that. It is perhaps impossible to have create perfectly neutral models or datasets, but the point here is awareness: that as engineers we should at least be wary of the biases of the data/models we develop, and documenting them in a way such that users/companies that use our systems know their limitations or weaknesses. Although we have yet to explore how to reduce bias, this exercise of building awareness constitutes the first step towards building ethical AI.
In [ ]: