Lab 05
Preprocessing
Text preprocessing is an important step for natural language processing (NLP) tasks. It transforms text into a more digestible form so that machine learning algorithms can perform better. It is important to understand what each preprocessing method does in order to help decide if it is appropriate for your particular task.
Text Wrangling
Text wrangling is converting/gathering/extracting formatted text from raw data.
For example, HTML does not include only text. Even when you extract only the text from HTML, it is not all meaningful (e.g. advertisements).
Have a look at the news article. We might be only interested in getting the headline and body of the article.
The following code removes some irrelevant tags (i.e. script, style, link, etc.) and displays the remaining tags. We will mainly utilize two packages:
• urllib: is a package that collects several modules for working with URLs. We will use urllib.request for opening and reading URLs (See details at urllib.request).
• BeautifulSoup: Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree (See details at BeautifulSoup).
In [ ]:
import urllib
from bs4 import BeautifulSoup
url = “https://www.smh.com.au/national/nsw/macquarie-uni-suspends-teaching-for-10-days-to-move-learning-online-20200317-p54avs.html”
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)
# remove irrelevant tags (script, style, link, etc.)
for script in soup([“script”, “style”, “link”, “head”, “noscript”]):
script.extract() # rip it out, i.e remove the tag from the tree
# The get_text() returns all the human-readable text beneath the tag as string
text = soup.get_text()
#print(text) # you can uncomment to have a look the returned text
# The prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string
print(soup.prettify())
We’re sorry, this service is currently unavailable. Please try again later.
Advertisement
This was published
1
year
ago
University of Sydney to move fully online while Macquarie cancels classes
By
Natassia Chrysanthos
and
Anna Patty
Updated
first published
at
The University of Sydney will suspend all face-to-face teaching from Monday and move fully online while Macquarie University has cancelled classes altogether in order to make the digital transition, revealing one of its students tested positive for COVID-19.
The University of Sydney’s 10,000 staff members have been encouraged to work remotely to slow the spread of coronavirus, but the campus Wi-Fi network and facilities will remain open with enhanced cleaning protocols and social distancing measures.
Courses with labs and practical components will be adapted for online or suspended until later in the semester while clinical placements for health students will go ahead under strict guidelines, Vice-Chancellor Michael Spence wrote to staff on Tuesday afternoon.
“We are anticipating this will be for the whole of semester and we’re planning on that,” Dr Spence told the
Herald.
He said some business school courses had been designed from scratch, while teachers would adapt other courses throughout semester based on student feedback.
“We’ve put a lot of effort and thought into how to do it. I think this is a tremendous opportunity. This could be an interesting pedagogical experiment,” he said.
The university
has already projected $200 million losses due to coronavirus
. Dr Spence said the expense of adapting courses, bolstering IT systems and student support would “cost us more overall” than regular teaching.
“At the moment we have to spend money to put education online to make sure our students’ experience is uninterrupted as possible. There has been an overwhelming response from students that [is what they want].”
Universities Australia deputy chief executive officer Anne-Marie Lansdown said 39 universities were providing online learning where possible amid the coronavirus pandemic. She said the challenge was increasing the amount of content that can be put online in a very short period of time.
“For some courses it will be much harder – especially those with significant practical or technical requirements,” Ms Lansdown said. “In the case of practical-based learning, where online may not be possible, universities are offering maximum flexibility, including delaying or deferring those components.”
Advertisement
Loading
Macquarie Vice-Chancellor Professor Bruce Dowton said in an email to staff on Tuesday morning that face-to-face and online teaching will be suspended for 12 days from Wednesday while the university transitions to online delivery of lectures and seminars.
“It will also allow us to redesign campus-based delivery of our units to modes that support social distancing and remote support,” he said.
Hours later, Macquarie confirmed a student had tested positive for COVID-19 the day before, and that several locations on campus had been cleaned overnight.
“The current advice is that the rest of campus can continue to operate as normal after the completion of intensive cleaning operations and in line with … moving to increase online delivery of educational programs,” a spokesperson said.
But students were concerned they had not been told where the infected student had been on campus and whether some needed to self-isolate.
“Many of the students are shocked that we could be left in the dark and that the university has not been able to make decisions for weeks about how to handle the virus situation,” one student, who requested anonymity, said.
The surrounding area of Macquarie Park was
the first hotspot for coronavirus community transmission in Sydney
. Macquarie’s mid-semester Easter break, due to take place between April 13-26, will now be a normal teaching period. Staff have been encouraged to work from home and non-essential events will been cancelled.
The University of NSW on Tuesday confirmed a third student tested positive for COVID-19 and had exhibited mild symptoms while in a three-hour evening science class last week.
UNSW is
making a quick transition to online learning
and its law school, which does not routinely record its lectures, will cease face-to-face lessons from Wednesday to move all classes online.
“We are currently working to finalise details around classes and, down the track, assessment… There will be some changes to the way that courses are delivered, including to involve more online activities as a replacement for classroom face-to-face discussions. Those are likely to evolve over the term,” acting head of the law school Melanie Schwartz wrote to students on Monday.
Unis try to avoid infection in share accommodation
Universities around the country are also seeking advice from public health experts on how to minimise the spread of COVID-19 among students living in campus accommodation and other share housing.
The Australian National University said many students congregated in residential halls “so across our student residences we’re implementing considered social distancing measures for our dining halls, kitchens and self-catering”.
“We’ve also reformatted social and academic support events and activities so that they have smaller numbers (25 people or less) and in some cases, these will be offered as online connections – we’re being creative to ensure pastoral care and community wellbeing are maintained,” a spokesperson said.
Loading
University of Sydney-owned student accommodation and residential colleges have also started additional cleaning and sanitation of common areas. Students are being provided with hand sanitisers, tissues and face masks and advised to keep a physical distance from each other. Housing
is being provided
to students that live in university-owned accommodation who need to self isolate.
Dining times have also been extended to allow students to stagger their meals and restrict access to communal utensils.
The University of NSW has asked students who need to self-isolate to avoid communal areas and avoid sharing utensils and tea towels. They have been advised to use separate bathroom and kitchen facilities and to regularly clean shared facilities. They should also wear a surgical mask while in the same room with any other people.
Natassia is the education reporter for The Sydney Morning Herald.
Anna Patty is a Senior Writer for The Sydney Morning Herald with a focus on higher education. She is a former Workplace Editor, Education Editor, State Political Reporter and Health Reporter.
Most Viewed in National
Loading
Try
tag
Using
tag is a common way to extract the main contents of the online news articles. BUT, do not expect this always provides what you want.
In [ ]:
# The findAll() method returns all the specified tags, it is the same as find_all()
# Set text=True will return only the specified tags with the text inside, you can try to set text=False to compare the difference
p_tags = soup.findAll(‘p’, text=True)
for i, p_tag in enumerate(p_tags):
print(str(i) + str(p_tag))
0
We’re sorry, this service is currently unavailable. Please try again later.
1
The University of Sydney will suspend all face-to-face teaching from Monday and move fully online while Macquarie University has cancelled classes altogether in order to make the digital transition, revealing one of its students tested positive for COVID-19.
2
The University of Sydney’s 10,000 staff members have been encouraged to work remotely to slow the spread of coronavirus, but the campus Wi-Fi network and facilities will remain open with enhanced cleaning protocols and social distancing measures.
3
Courses with labs and practical components will be adapted for online or suspended until later in the semester while clinical placements for health students will go ahead under strict guidelines, Vice-Chancellor Michael Spence wrote to staff on Tuesday afternoon.
4
“We’ve put a lot of effort and thought into how to do it. I think this is a tremendous opportunity. This could be an interesting pedagogical experiment,” he said.
5
“At the moment we have to spend money to put education online to make sure our students’ experience is uninterrupted as possible. There has been an overwhelming response from students that [is what they want].”
6
Universities Australia deputy chief executive officer Anne-Marie Lansdown said 39 universities were providing online learning where possible amid the coronavirus pandemic. She said the challenge was increasing the amount of content that can be put online in a very short period of time.
7
“For some courses it will be much harder – especially those with significant practical or technical requirements,” Ms Lansdown said. “In the case of practical-based learning, where online may not be possible, universities are offering maximum flexibility, including delaying or deferring those components.”
8
Macquarie Vice-Chancellor Professor Bruce Dowton said in an email to staff on Tuesday morning that face-to-face and online teaching will be suspended for 12 days from Wednesday while the university transitions to online delivery of lectures and seminars.
9
“It will also allow us to redesign campus-based delivery of our units to modes that support social distancing and remote support,” he said.
10
Hours later, Macquarie confirmed a student had tested positive for COVID-19 the day before, and that several locations on campus had been cleaned overnight.
11
“The current advice is that the rest of campus can continue to operate as normal after the completion of intensive cleaning operations and in line with … moving to increase online delivery of educational programs,” a spokesperson said.
12
But students were concerned they had not been told where the infected student had been on campus and whether some needed to self-isolate.
13
“Many of the students are shocked that we could be left in the dark and that the university has not been able to make decisions for weeks about how to handle the virus situation,” one student, who requested anonymity, said.
14
The University of NSW on Tuesday confirmed a third student tested positive for COVID-19 and had exhibited mild symptoms while in a three-hour evening science class last week.
15
Universities around the country are also seeking advice from public health experts on how to minimise the spread of COVID-19 among students living in campus accommodation and other share housing.
16
The Australian National University said many students congregated in residential halls “so across our student residences we’re implementing considered social distancing measures for our dining halls, kitchens and self-catering”.
17
“We’ve also reformatted social and academic support events and activities so that they have smaller numbers (25 people or less) and in some cases, these will be offered as online connections – we’re being creative to ensure pastoral care and community wellbeing are maintained,” a spokesperson said.
18
Dining times have also been extended to allow students to stagger their meals and restrict access to communal utensils.
19
The University of NSW has asked students who need to self-isolate to avoid communal areas and avoid sharing utensils and tea towels. They have been advised to use separate bathroom and kitchen facilities and to regularly clean shared facilities. They should also wear a surgical mask while in the same room with any other people.
20
Natassia is the education reporter for The Sydney Morning Herald.
21
Anna Patty is a Senior Writer for The Sydney Morning Herald with a focus on higher education. She is a former Workplace Editor, Education Editor, State Political Reporter and Health Reporter.
Punctuation removal
First, let’s try to remove punctuation by using an exhaustive list of symbols!
In [ ]:
puncts = [‘,’, ‘.’, ‘”‘, ‘:’, ‘)’, ‘(‘, ‘-‘, ‘!’, ‘?’, ‘|’, ‘;’, “‘”, ‘$’, ‘&’, ‘/’, ‘[‘, ‘]’, ‘>’, ‘%’, ‘=’, ‘#’, ‘*’, ‘+’, ‘\\’, ‘•’, ‘~’, ‘@’, ‘£’,
‘·’, ‘_’, ‘{‘, ‘}’, ‘©’, ‘^’, ‘®’, ‘`’, ‘<', '→', '°', '€', '™', '›', '♥', '←', '×', '§', '″', '′', 'Â', '█', '½', 'à', '…',
'“', '★', '”', '–', '●', 'â', '►', '−', '¢', '²', '¬', '░', '¶', '↑', '±', '¿', '▾', '═', '¦', '║', '―', '¥', '▓', '—', '‹', '─',
'▒', ':', '¼', '⊕', '▼', '▪', '†', '■', '’', '▀', '¨', '▄', '♫', '☆', 'é', '¯', '♦', '¤', '▲', 'è', '¸', '¾', 'Ã', '⋅', '‘', '∞',
'∙', ')', '↓', '、', '│', '(', '»', ',', '♪', '╩', '╚', '³', '・', '╦', '╣', '╔', '╗', '▬', '❤', 'ï', 'Ø', '¹', '≤', '‡', '√', ]
def remove_punctuation(x):
x = str(x)
for punct in puncts:
if punct in x:
x = x.replace(punct, '')
return x
text = "It's a nice day[]"
print(remove_punctuation(text))
Its a nice day
Alternatively, what about using regular expressions (re package)?
In [ ]:
import re
def remove_punctuation_re(x):
x = re.sub(r'[^\w\s]','',x)
return x
text = "It's a nice day[]"
print(remove_punctuation_re(text))
Its a nice day
Ok. Then what about emoticons? 🙂 or 😀 or 🙁 Some tasks may want you to keep emoticons e.g. sentiment analysis on tweets.
In [ ]:
#you can find the solution from the TweetTokenizer https://www.nltk.org/_modules/nltk/tokenize/casual.html#TweetTokenizer (search "EMOTICONS" in the page)
EMOTICONS = r"""
(?:
[<>]?
[:;=8] # eyes
[\-o\*\’]? # optional nose
[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth
|
[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth
[\-o\*\’]? # optional nose
[:;=8] # eyes
[<>]?
|
<3 # heart
)"""
Have a look at some contractions of words! Contractions include punctuation - how would you handle them?
In [ ]:
# These are just common English contractions. There are many edge cases. i.e. University's working on it.
contraction_dict = {"ain't": "is not", "aren't": "are not","can't": "cannot", "'cause": "because", "could've": "could have",
"couldn't": "could not", "didn't": "did not", "doesn't": "does not", "don't": "do not", "hadn't": "had not",
"hasn't": "has not", "haven't": "have not", "he'd": "he would","he'll": "he will", "he's": "he is", "how'd": "how did",
"how'd'y": "how do you", "how'll": "how will", "how's": "how is", "I'd": "I would", "I'd've": "I would have",
"I'll": "I will", "I'll've": "I will have","I'm": "I am", "I've": "I have", "i'd": "i would", "i'd've": "i would have",
"i'll": "i will", "i'll've": "i will have","i'm": "i am", "i've": "i have", "isn't": "is not", "it'd": "it would",
"it'd've": "it would have", "it'll": "it will", "it'll've": "it will have","it's": "it is", "let's": "let us",
"ma'am": "madam", "mayn't": "may not", "might've": "might have","mightn't": "might not","mightn't've": "might not have",
"must've": "must have", "mustn't": "must not", "mustn't've": "must not have", "needn't": "need not", "needn't've": "need not have",
"o'clock": "of the clock", "oughtn't": "ought not", "oughtn't've": "ought not have", "shan't": "shall not", "sha'n't": "shall not",
"shan't've": "shall not have", "she'd": "she would", "she'd've": "she would have", "she'll": "she will", "she'll've": "she will have",
"she's": "she is", "should've": "should have", "shouldn't": "should not", "shouldn't've": "should not have", "so've": "so have",
"so's": "so as", "this's": "this is","that'd": "that would", "that'd've": "that would have", "that's": "that is", "there'd": "there would",
"there'd've": "there would have", "there's": "there is", "here's": "here is","they'd": "they would", "they'd've": "they would have",
"they'll": "they will", "they'll've": "they will have", "they're": "they are", "they've": "they have", "to've": "to have", "wasn't": "was not",
"we'd": "we would", "we'd've": "we would have", "we'll": "we will", "we'll've": "we will have", "we're": "we are", "we've": "we have",
"weren't": "were not", "what'll": "what will", "what'll've": "what will have", "what're": "what are", "what's": "what is", "what've": "what have",
"when's": "when is", "when've": "when have", "where'd": "where did", "where's": "where is", "where've": "where have", "who'll": "who will",
"who'll've": "who will have", "who's": "who is", "who've": "who have", "why's": "why is", "why've": "why have", "will've": "will have",
"won't": "will not", "won't've": "will not have", "would've": "would have", "wouldn't": "would not", "wouldn't've": "would not have",
"y'all": "you all", "y'all'd": "you all would","y'all'd've": "you all would have","y'all're": "you all are","y'all've": "you all have",
"you'd": "you would", "you'd've": "you would have", "you'll": "you will", "you'll've": "you will have", "you're": "you are", "you've": "you have"}
Stopwords removal
Stopwords are the most common words in any natural language. For the purpose of analyzing text data and building NLP models, these stopwords might not add much value to the meaning of the document. Generally, the most common words used in a text are “the”, “is”, “in”, “for”, “where”, “when”, “to”, “at” etc.
In [ ]:
# You must be familiar with it already since we've tried this in Lab 1
import nltk
nltk.download('punkt')
nltk.download('stopwords')
from nltk.corpus import stopwords as sw
from nltk.tokenize import word_tokenize
my_sent = "Natural Language Processing is fun but challenging."
tokens = word_tokenize(my_sent)
stop_words = sw.words()
filtered_sentence = [w for w in tokens if not w in stop_words]
print(filtered_sentence)
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
['Natural', 'Language', 'Processing', 'fun', 'challenging', '.']
Case-folding
How would you handle case? A common strategy is to do case-folding by reducing all letters to lower case
In [ ]:
text = "Hello there!"
#Returns the result of converting all characters in text to lowercase.
print(text.lower())
#do we need to reduce all letters to lower case?
text2 = "I love University of Sydney :D"
print(text2.lower())
hello there!
i love university of sydney :d
Stemming
Stemming is a process of removing and replacing word suffixes to arrive at a common root form of the word.
• Try various types of NLTK stemmer in demo
• A comparative study of stemming algorithm: Paper Link
In [ ]:
#let's try to test with porter algorithm
from nltk.stem.porter import *
stemmer = PorterStemmer()
plurals = ['caresses', 'flies', 'dies', 'mules', 'denied',
'died', 'agreed', 'owned', 'humbled', 'sized',
'meeting', 'stating', 'siezing', 'itemization',
'sensational', 'traditional', 'reference', 'colonizer',
'plotted']
singles = [stemmer.stem(plural) for plural in plurals]
print(singles)
['caress', 'fli', 'die', 'mule', 'deni', 'die', 'agre', 'own', 'humbl', 'size', 'meet', 'state', 'siez', 'item', 'sensat', 'tradit', 'refer', 'colon', 'plot']
Lemmatisation
Lemmatisation is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form
In [ ]:
#by NLTK Wordnet
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("cats"))
print(lemmatizer.lemmatize("cacti"))
print(lemmatizer.lemmatize("geese"))
print(lemmatizer.lemmatize("rocks"))
print(lemmatizer.lemmatize("python"))
print(lemmatizer.lemmatize("better", pos="a"))
print(lemmatizer.lemmatize("best", pos="a"))
print(lemmatizer.lemmatize("run"))
print(lemmatizer.lemmatize("run",'v'))
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Unzipping corpora/wordnet.zip.
cat
cactus
goose
rock
python
good
best
run
run
Tokenisation
Given a character sequence and a defined document unit (word, sentence etc.), tokenisation is the task of chopping it up into pieces, called tokens, perhaps at the same time throwing away certain characters, such as punctuation
Try various types of NLTK Tokenizer in demo.
NLTK Tokeniser API Doc
TweetTokenizer: Twitter-aware tokeniser
In [ ]:
from nltk.tokenize import TweetTokenizer
tknzr = TweetTokenizer()
s0 = "I am so happy 🙂 ;)"
print(tknzr.tokenize(s0))
s0 = "I am so sad :("
print(tknzr.tokenize(s0))
['I', 'am', 'so', 'happy', ':)', ';)']
['I', 'am', 'so', 'sad', ':(']
TreebankWordTokenizer
The Treebank tokenizer uses regular expressions to tokenize text as in Penn Treebank.
In [ ]:
from nltk.tokenize import TreebankWordTokenizer
tknzr = TreebankWordTokenizer()
s0 = "I am so happy 🙂 ;)"
print(tknzr.tokenize(s0))
s0 = "I am so sad :("
print(tknzr.tokenize(s0))
['I', 'am', 'so', 'happy', ':', ')', ';', ')']
['I', 'am', 'so', 'sad', ':', '(']
Word Cloud
• Word Cloud
• Wikipedia Python
In [ ]:
!pip install wikipedia
Collecting wikipedia
Downloading https://files.pythonhosted.org/packages/67/35/25e68fbc99e672127cc6fbb14b8ec1ba3dfef035bf1e4c90f78f24a80b7d/wikipedia-1.4.0.tar.gz
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.7/dist-packages (from wikipedia) (4.6.3)
Requirement already satisfied: requests<3.0.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from wikipedia) (2.23.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.0.0->wikipedia) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.0.0->wikipedia) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.0.0->wikipedia) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.0.0->wikipedia) (1.24.3)
Building wheels for collected packages: wikipedia
Building wheel for wikipedia (setup.py) … done
Created wheel for wikipedia: filename=wikipedia-1.4.0-cp37-none-any.whl size=11686 sha256=5f4a9b115f1e9b3cb35d1cc16dcb9369699cbf061156531aa26081c73a11592c
Stored in directory: /root/.cache/pip/wheels/87/2a/18/4e471fd96d12114d16fe4a446d00c3b38fb9efcb744bd31f4a
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0
In [ ]:
from wordcloud import WordCloud
import wikipedia
# Getting wikipedia contents of “COVID-19_pandemic”
text = wikipedia.page(“COVID-19_pandemic”).content
# Generate a word cloud image
wordcloud = WordCloud().generate(text)
# Display the generated image:
# the matplotlib way:
import matplotlib.pyplot as plt
plt.imshow(wordcloud, interpolation=’bilinear’)
plt.axis(“off”)
plt.show()

Try more word cloud examples: Link
Saving and Loading Models
Saving model
In [ ]:
# Let’s train a model first
import torch
import torch.nn.functional as F
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
class TheModelClass(nn.Module):
def __init__(self):
super(TheModelClass, self).__init__()
self.linear = nn.Linear(1, 1)
def forward(self, input):
output = self.linear(input)
return output
no_of_epochs = 500
display_interval = 20
learning_rate=0.01
# training data
x_training = np.asarray([[1],[2],[5],[8],[9],[12],[14],[16],[18],[20]])
y_training = np.asarray([100,200,501,780,901,1201,1399,1598,1800,2000])
x_data_torch = torch.from_numpy(x_training).float()
y_data_torch = torch.from_numpy(y_training).float()
model = TheModelClass()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
for epoch in range(no_of_epochs):
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = model(x_data_torch)
loss = torch.sum(torch.pow(outputs-y_data_torch.view(-1,1),2))/(2*x_training.shape[0])
loss.backward()
optimizer.step()
if epoch%display_interval == display_interval-1:
print(‘Epoch: %d, loss: %.3f’ %(epoch + 1, loss.item() ))
Epoch: 20, loss: 29.082
Epoch: 40, loss: 28.009
Epoch: 60, loss: 27.043
Epoch: 80, loss: 26.173
Epoch: 100, loss: 25.390
Epoch: 120, loss: 24.684
Epoch: 140, loss: 24.048
Epoch: 160, loss: 23.476
Epoch: 180, loss: 22.960
Epoch: 200, loss: 22.496
Epoch: 220, loss: 22.077
Epoch: 240, loss: 21.700
Epoch: 260, loss: 21.361
Epoch: 280, loss: 21.056
Epoch: 300, loss: 20.780
Epoch: 320, loss: 20.532
Epoch: 340, loss: 20.309
Epoch: 360, loss: 20.108
Epoch: 380, loss: 19.927
Epoch: 400, loss: 19.764
Epoch: 420, loss: 19.617
Epoch: 440, loss: 19.485
Epoch: 460, loss: 19.366
Epoch: 480, loss: 19.258
Epoch: 500, loss: 19.162
In [ ]:
# Now we save the trained model to the file named ‘filename.pt’
torch.save(model, ‘filename.pt’)
Loading model
In [ ]:
# Please note that you have to run the code for defining the TheModelClass in the above section then you can load the model weights from the saved model file
the_saved_model = torch.load(‘filename.pt’)
the_saved_model.eval()
Out[ ]:
TheModelClass(
(linear): Linear(in_features=1, out_features=1, bias=True)
)
In [ ]:
prediction = the_saved_model(x_data_torch).detach().numpy()
for i in range(len(y_training)):
print(‘X: %d, Y_true: %d, Y_predict: %.3f’%(x_training[i][0],y_training[i],prediction[i][0]))
X: 1, Y_true: 100, Y_predict: 99.670
X: 2, Y_true: 200, Y_predict: 199.565
X: 5, Y_true: 501, Y_predict: 499.250
X: 8, Y_true: 780, Y_predict: 798.935
X: 9, Y_true: 901, Y_predict: 898.831
X: 12, Y_true: 1201, Y_predict: 1198.516
X: 14, Y_true: 1399, Y_predict: 1398.306
X: 16, Y_true: 1598, Y_predict: 1598.097
X: 18, Y_true: 1800, Y_predict: 1797.887
X: 20, Y_true: 2000, Y_predict: 1997.677
How to Save (Upload) the model to your Google Drive
There are various ways to upload the files on Google Drive.
This tutorial will guide you how to save the files on your Google Drive.
1. Mounting Google Drive locally
2. Create a new Drive file
Bi-LSTM with Hidden State Extraction
The folllowing image represents a Bi-LSTM for an N to 1 task. In an N to 1 task, it is usually required to extract the last hidden states of forward lstm and backward lstm and combine (concat) them. (Please check the lecture 5 recording, please!)
Bi-LSTM: Bidirectional LSTM, which means the signal propagates backward as well as forward in time.

nn.Embedding
In lab4 E2, we provide the embeddings for each token in each sentence. These are constructed by the pre-trained word embedding model. For example, if the sequence length of the corpus is 8 (think about why we want a uniform sequence length for the whole dataset), the embedding for the sentence “i am crazy in love” should be $[W_{i}, W_{am}, W_{crazy},W_{in}, W_{love}, W_{[PAD]}, W_{[PAD]}, W_{[PAD]}]$(if you choose post-padding) or $[W_{[PAD]}, W_{[PAD]},W_{[PAD]}, W_{i}, W_{am}, W_{crazy},W_{in}, W_{love}]$(if you choose pre-padding).
Therefore, after getting the embedding of each sentence, you will get a tensor with the shape of (train_size, seq_length, emb_dimension). However, if these three values (train_size, seq_length, emb_dimension) are large enough, you might get Out-Of-Memory(OOM) problem due to limited CPU/GPU.
One solution is using nn.Embedding as a lookup table to get the embedding for each token/word during the training process instead of generating them all beforehand. (You should have already seen it before in the lab4 E2 sample solution).
In [ ]:
# Toy Data for sentiment analysis
sentences = [[‘i’,’like’,’that’],
[‘i’,’love’,’it’],
[‘i’,’hate’,’that’],
[‘i’,’do’,’not’,’like’,’it’]]
labels = [“Positive”,”Positive”,”Negative”,”Negative”]
In [ ]:
# Set is a hashtable in python
word_set = set()
for sent in sentences:
for word in sent:
word_set.add(word)
# Sometimes you can use same token to present PAD and UNKOWN if you just want to set them as all zeros
word_set.add(‘[PAD]’)
word_set.add(‘[UNKNOWN]’)
word_list = list(word_set)
# Although in some python versions, converting a set to list will return a ordered result,
# It is still highly recommended that you sort this list to make sure the reproducibility of your code
word_list.sort()
print(word_list)
word_index = {}
ind = 0
for word in word_list:
word_index[word] = ind
ind += 1
print(word_index)
[‘[PAD]’, ‘[UNKNOWN]’, ‘do’, ‘hate’, ‘i’, ‘it’, ‘like’, ‘love’, ‘not’, ‘that’]
{‘[PAD]’: 0, ‘[UNKNOWN]’: 1, ‘do’: 2, ‘hate’: 3, ‘i’: 4, ‘it’: 5, ‘like’: 6, ‘love’: 7, ‘not’: 8, ‘that’: 9}
In [ ]:
# Convert the sentences to the word index
len_list = [len(s) for s in sentences]
seq_length = max(len_list)
def encode_and_add_padding(sentences, seq_length, word_index):
sent_encoded = []
for sent in sentences:
temp_encoded = [word_index[word] for word in sent]
if len(temp_encoded) < seq_length:
temp_encoded += [word_index['[PAD]']] * (seq_length - len(temp_encoded))
sent_encoded.append(temp_encoded)
return sent_encoded
sent_encoded = encode_and_add_padding(sentences, seq_length, word_index)
print(sent_encoded)
[[4, 6, 9, 0, 0], [4, 7, 5, 0, 0], [4, 3, 9, 0, 0], [4, 2, 8, 6, 5]]
In [ ]:
# Download Pre-trained Embedding
import gensim.downloader as api
word_emb_model = api.load("glove-twitter-25")
[==================================================] 100.0% 104.8/104.8MB downloaded
In [ ]:
# Create the Embedding lookup table
import numpy as np
emb_dim = word_emb_model.vector_size
emb_table = []
for i, word in enumerate(word_list):
if word in word_emb_model:
emb_table.append(word_emb_model[word])
else:
emb_table.append([0]*emb_dim)
emb_table = np.array(emb_table)
# print(emb_table)
In [ ]:
# LabelEncoder can help us encode target labels with value between 0 and n_classes-1.
# Details can be found from: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
from sklearn.preprocessing import LabelEncoder
lEnc = LabelEncoder()
lEnc.fit(labels)
label_encoded= lEnc.transform(labels)
print(label_encoded)
[1 1 0 0]
In [ ]:
vocab_size = len(word_list)
unique_labels = np.unique(labels)
n_class = len(unique_labels)
n_hidden = 32
learning_rate = 0.01
total_epoch = 10
In [ ]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from sklearn.metrics import accuracy_score
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class Bi_LSTM_Emb(nn.Module):
def __init__(self):
super(Bi_LSTM_Emb, self).__init__()
self.emb = nn.Embedding(vocab_size, emb_dim)
# Initialize the Embedding layer with the lookup table we created
self.emb.weight.data.copy_(torch.from_numpy(emb_table))
# Optional: set requires_grad = False to make this lookup table untrainable
self.emb.weight.requires_grad = False
self.lstm = nn.LSTM(emb_dim, n_hidden, batch_first =True, bidirectional=True)
self.linear = nn.Linear(n_hidden*2, n_class)
def forward(self, x):
# Get the embeded tensor
x = self.emb(x)
# we will use the returned h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len.
# details of the outputs from nn.LSTM can be found from: https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html
lstm_out, (h_n,c_n) = self.lstm(x)
# concat the last hidden state from two direction
hidden_out =torch.cat((h_n[0,:,:],h_n[1,:,:]),1)
z = self.linear(hidden_out)
return z
# Move the model to GPU
model = Bi_LSTM_Emb().to(device)
# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# Preparing input
input_torch = torch.from_numpy(np.array(sent_encoded)).to(device)
targe_torch = torch.from_numpy(np.array(label_encoded)).view(-1).to(device)
for epoch in range(total_epoch):
# Set the flag to training
model.train()
# forward + backward + optimize
optimizer.zero_grad()
outputs = model(input_torch)
loss = criterion(outputs, targe_torch)
loss.backward()
optimizer.step()
predicted = torch.argmax(outputs, -1)
acc= accuracy_score(predicted.cpu().numpy(),targe_torch.cpu().numpy())
print('Epoch: %d, loss: %.5f, train_acc: %.2f' %(epoch + 1, loss.item(), acc))
print('Finished Training')
Epoch: 1, loss: 0.70972, train_acc: 0.50
Epoch: 2, loss: 0.66151, train_acc: 0.50
Epoch: 3, loss: 0.62344, train_acc: 0.50
Epoch: 4, loss: 0.57242, train_acc: 1.00
Epoch: 5, loss: 0.51628, train_acc: 0.75
Epoch: 6, loss: 0.45956, train_acc: 0.75
Epoch: 7, loss: 0.39498, train_acc: 0.75
Epoch: 8, loss: 0.33047, train_acc: 1.00
Epoch: 9, loss: 0.27055, train_acc: 1.00
Epoch: 10, loss: 0.19957, train_acc: 1.00
Finished Training
In [ ]:
# You can check whether model.emb.weight changed
# You can also try to comment self.emb.weight.requires_grad = False and then train the model and check again
# print(model.emb.weight)
Exercise
E1. Briefly describe the difference between Stemming and Lemmatisation.
Please write down your answer below in your own words with examples
Your answer:
E2. Preprocessing and Model Saving
In this exercise, you are to preprocess the train and test data, and apply different pre-trained embeddings.
Note: We won't mark your exercise based on the test set performance, we will only check whether the preprocessing part and embedding part are correct.
In [ ]:
import torch
#You can enable GPU here (cuda); or just CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Download Dataset
In [ ]:
# Code to download file into Colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
id = '1gNfBqguzBu8cHKMPc8C44GbvD443dNC5'
downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('twitter.csv')
import pandas as pd
df = pd.read_csv("twitter.csv")
df_pick = df.sample(400,random_state=24)
raw_text = df_pick["Text"].tolist()
raw_label = df_pick["Label"].tolist()
from sklearn.model_selection import train_test_split
text_train,text_test,label_train,label_test = train_test_split(raw_text,raw_label,test_size=0.25,random_state=42)
Preprocessing [Complete this section]
Case Folding
In [ ]:
text_train = [s.lower() for s in text_train]
text_test = [s.lower() for s in text_test]
Remove punctuations [Please complete this section]
In [ ]:
import re
def remove_punctuation_re(x):
# Please complete this
return x
text_train = [remove_punctuation_re(s) for s in text_train]
text_test = [remove_punctuation_re(s) for s in text_test]
Tokenization [Please complete this section]
In [ ]:
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize
#Please complete this
text_train =
text_test =
Remove stopwords [Please complete this section]
In [ ]:
nltk.download('stopwords')
from nltk.corpus import stopwords as sw
stop_words = sw.words()
text_train_ns=[]
for tokens in text_train:
filtered_sentence = [w for w in tokens if not w in stop_words]
text_train_ns.append(filtered_sentence)
text_test_ns=[]
for tokens in text_test:
#Please complete this
Lemmatisation [Please complete this section]
In [ ]:
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
text_train_le = []
for tokens in text_train_ns:
lemma_sentence = [lemmatizer.lemmatize(w) for w in tokens ]
text_train_le.append(lemma_sentence)
text_test_le = []
for tokens in text_test_ns:
#Please complete this
Label Encoding [Please complete this section]
In [ ]:
import numpy as np
from sklearn.preprocessing import LabelEncoder
unique_labels = np.unique(label_train)
lEnc = LabelEncoder()
# Please encode the labels (Do NOT add new lines of code in this section)
# Hint: Try to understand the difference between fit_transform and transform
label_train_encoded =
label_test_encoded =
n_class = len(unique_labels)
print(unique_labels)
print(lEnc.transform(unique_labels))
Embeddings [Complete this section]
Get Word List
In [ ]:
word_set = set()
for sent in text_train_le:
for word in sent:
word_set.add(word)
word_set.add('[PAD]')
word_set.add('[UNKNOWN]')
word_list = list(word_set)
word_list.sort()
print(word_list)
word_index = {}
ind = 0
for word in word_list:
word_index[word] = ind
ind += 1
print(word_index)
padding and encoding [Please complete this section]
In [ ]:
# The sequence length is pre-defined, you can't change this value for this exercise
seq_length = 16
# Please Complete this function
# Hint: You should pay attention to: (1) if the sentence length > seq_length (2) if the word not in word_index dictionary
def encode_and_add_padding(sentences, seq_length, word_index):
sent_encoded = []
return sent_encoded
train_pad_encoded = encode_and_add_padding(text_train_le, seq_length, word_index )
test_pad_encoded = encode_and_add_padding(text_test_le, seq_length, word_index )
Download Embeddings [Please complete this section]
You can find the details from https://github.com/RaRe-Technologies/gensim-data
In [ ]:
import gensim.downloader as api
word_emb_model = api.load(“xxx”) # Download an embedding other than glove-twitter-25
Get embeddings
In [ ]:
# Get the Embedding lookup table
import numpy as np
emb_dim = word_emb_model.vector_size
emb_table = []
for i, word in enumerate(word_list):
if word in word_emb_model:
emb_table.append(word_emb_model[word])
else:
emb_table.append([0]*emb_dim)
emb_table = np.array(emb_table)
Model
In [ ]:
vocab_size = len(word_list)
n_hidden = 50
total_epoch = 100
learning_rate = 0.01
In [ ]:
import torch
#You can enable GPU here (cuda); or just CPU
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from sklearn.metrics import accuracy_score
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.emb = nn.Embedding(vocab_size, emb_dim)
self.emb.weight.data.copy_(torch.from_numpy(emb_table))
self.emb.weight.requires_grad = False
self.lstm = nn.LSTM(emb_dim, n_hidden, num_layers=2, batch_first =True, dropout=0.2)
self.linear = nn.Linear(n_hidden,n_class)
def forward(self, x):
x = self.emb(x)
x,_ = self.lstm(x)
x = self.linear(x[:,-1,:])
return x
model = Model().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
input_torch = torch.from_numpy(np.array(train_pad_encoded)).to(device)
target_torch = torch.from_numpy(np.array(label_train_encoded)).view(-1).to(device)
for epoch in range(total_epoch):
model.train()
optimizer.zero_grad()
outputs = model(input_torch)
loss = criterion(outputs, target_torch)
loss.backward()
optimizer.step()
if epoch%10 == 9:
predicted = torch.argmax(outputs, -1)
acc= accuracy_score(predicted.cpu().numpy(),target_torch.cpu().numpy())
print(‘Epoch: %d, loss: %.5f, train_acc: %.2f’ %(epoch + 1, loss.item(), acc))
print(‘Finished Training’)
Save and Load the model [Complete this section]
Save the model [Complete this part]
In [ ]:
Load the model
In [ ]:
model2 = torch.load(‘lab5.pt’)
model2.eval()
Testing
In [ ]:
input_torch = torch.from_numpy(np.array(test_pad_encoded)).to(device)
outputs = model2(input_torch)
predicted = torch.argmax(outputs, -1)
from sklearn.metrics import classification_report
print(classification_report(label_test_encoded,predicted.cpu().numpy()))
Sample Solution for E2
Download Dataset
In [ ]:
# Code to download file into Colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
id = ‘1gNfBqguzBu8cHKMPc8C44GbvD443dNC5’
downloaded = drive.CreateFile({‘id’:id})
downloaded.GetContentFile(‘twitter.csv’)
import pandas as pd
df = pd.read_csv(“twitter.csv”)
df_pick = df.sample(400,random_state=24)
raw_text = df_pick[“Text”].tolist()
raw_label = df_pick[“Label”].tolist()
from sklearn.model_selection import train_test_split
text_train,text_test,label_train,label_test = train_test_split(raw_text,raw_label,test_size=0.25,random_state=42)
Preprocessing
Case-Folding
In [ ]:
text_train = [s.lower() for s in text_train]
text_test = [s.lower() for s in text_test]
Remove punctuations
In [ ]:
import re
def remove_punctuation_re(x):
x = re.sub(r'[^\w\s]’,”,x)
return x
text_train = [remove_punctuation_re(s) for s in text_train]
text_test = [remove_punctuation_re(s) for s in text_test]
Tokenization
In [ ]:
import nltk
nltk.download(‘punkt’)
from nltk.tokenize import word_tokenize
text_train = [word_tokenize(s) for s in text_train]
text_test = [word_tokenize(s) for s in text_test]
[nltk_data] Downloading package punkt to /root/nltk_data…
[nltk_data] Package punkt is already up-to-date!
Remove stopwords
In [ ]:
nltk.download(‘stopwords’)
from nltk.corpus import stopwords as sw
stop_words = sw.words()
text_train_ns=[]
for tokens in text_train:
filtered_sentence = [w for w in tokens if not w in stop_words]
text_train_ns.append(filtered_sentence)
text_test_ns=[]
for tokens in text_test:
filtered_sentence = [w for w in tokens if not w in stop_words]
text_test_ns.append(filtered_sentence)
[nltk_data] Downloading package stopwords to /root/nltk_data…
[nltk_data] Package stopwords is already up-to-date!
Lemmatisation
In [ ]:
nltk.download(‘wordnet’)
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
text_train_le = []
for tokens in text_train_ns:
lemma_sentence = [lemmatizer.lemmatize(w) for w in tokens ]
text_train_le.append(lemma_sentence)
text_test_le = []
for tokens in text_test_ns:
lemma_sentence = [lemmatizer.lemmatize(w) for w in tokens ]
text_test_le.append(lemma_sentence)
[nltk_data] Downloading package wordnet to /root/nltk_data…
[nltk_data] Package wordnet is already up-to-date!
Label Encoding
In [ ]:
import numpy as np
from sklearn.preprocessing import LabelEncoder
unique_labels = np.unique(label_train)
lEnc = LabelEncoder()
# Please encode the labels (Do NOT add new lines of code in this section)
label_train_encoded = lEnc.fit_transform(label_train)
label_test_encoded = lEnc.transform(label_test)
n_class = len(unique_labels)
print(unique_labels)
print(lEnc.transform(unique_labels))
[‘none’ ‘racism’ ‘sexism’]
[0 1 2]
Embeddings
Get Word list
In [ ]:
#Set is a hashtable in python
word_set = set()
for sent in text_train_le:
for word in sent:
word_set.add(word)
# Sometimes you can use same token to present PAD and UNKOWN if you just want to set them as all zeros
word_set.add(‘[PAD]’)
word_set.add(‘[UNKOWN]’)
word_list = list(word_set)
# Although in some python versions, converting a set to list will return a ordered result,
# But it is highly recommended that you sort this list to make sure the reproducibility of your code
word_list.sort()
print(word_list)
word_index = {}
ind = 0
for word in word_list:
word_index[word] = ind
ind += 1
print(word_index)
[‘0′, ’06jank’, ‘0xjared’, ‘1’, ’11’, ’12’, ’14’, ‘1400’, ’15’, ’17’, ‘1shadeofritch’, ‘2’, ‘2027279099’, ‘22000’, ‘2ndbestidiot’, ‘3’, ‘3outof10’, ‘4’, ’44’, ’47’, ‘4x’, ‘5’, ‘6’, ‘7’, ’80’, ‘800’, ’90’, ‘911’, ’98halima’, ’99’, ‘[PAD]’, ‘[UNKOWN]’, ‘__chris33__’, ‘_marisajane’, ‘abducted’, ‘abdul_a95’, ‘aberration’, ‘ability’, ‘ablahad’, ‘able’, ‘absolutely’, ‘abuse’, ‘ac360’, ‘accept’, ‘acceptable’, ‘accepted’, ‘accessorizing’, ‘account’, ‘achieve’, ‘across’, ‘actoractress’, ‘actually’, ‘adjective’, ‘admits’, ‘adult’, ‘afar’, ‘afraid’, ‘ago’, ‘agree’, ‘ahahahaha’, ‘air’, ‘airstrikes’, ‘aisle’, ‘ajwatamr’, ‘akheemv’, ‘aledthomas22’, ‘alihadi68’, ‘alihashem_tv’, ‘alive’, ‘all_hailcaesar’, ‘allegedly’, ‘allstatejackie’, ‘ally’, ‘along’, ‘already’, ‘alternet’, ‘amaze’, ‘amazing’, ‘amazingly’, ‘amberhasalamb’, ‘ameliagreenhall’, ‘american’, ‘amohedin’, ‘amp’, ‘amymek’, ‘anasmechch’, ‘andcamping’, ‘andre’, ‘angelemichelle’, ‘anitaingle’, ‘annoying’, ‘another’, ‘answer’, ‘anti’, ‘antiharassment’, ‘antizholim’, ‘anyone’, ‘anything’, ‘apartheid’, ‘arab_fury’, ‘arabic’, ‘arabthomness’, ‘archangel_dux’, ‘arena’, ‘argh’, ‘argonblue’, ‘argument’, ‘armedyoure’, ‘armenian’, ‘armpit’, ‘army’, ‘arquette’, ‘article’, ‘asad’, ‘asem_1994’, ‘ask’, ‘asked’, ‘askgoog’, ‘askhermore’, ‘asshole’, ‘astounding’, ‘athlete’, ‘attack’, ‘attacked’, ‘attacking’, ‘attracted’, ‘attractive’, ‘attributing’, ‘attrocities’, ‘auntysoapbox’, ‘australia’, ‘away’, ‘awful’, ‘awkward’, ‘awww’, ‘b’, ‘babbling’, ‘back’, ‘backing’, ‘backwards’, ‘bad’, ‘bag’, ‘baghdad’, ‘bahai144’, ‘bamboozled’, ‘banter’, ‘barackobama’, ‘basically’, ‘bastendorfgames’, ‘batchelorshow’, ‘bcz’, ‘beadsland’, ‘beat’, ‘beautiful’, ‘beautifula’, ‘beavis’, ‘beckles’, ‘begun’, ‘behaving’, ‘behead’, ‘behind’, ‘belief’, ‘believe’, ‘benkuchera’, ‘best’, ‘better’, ‘beyond’, ‘bgs’, ‘bhamdailynews’, ‘biebervalue’, ‘big’, ‘bigot’, ‘bigotry’, ‘bigtime’, ‘bilalighumman’, ‘bill’, ‘bimbo’, ‘bimbolines’, ‘bit’, ‘bitch’, ‘bixs’, ‘blabber’, ‘blackopal80’, ‘block’, ‘blocked’, ‘bloke’, ‘blonde’, ‘blondemoment’, ‘blow’, ‘blumenthal’, ‘body’, ‘book’, ‘booted’, ‘boring’, ‘bos’, ‘bottle’, ‘bought’, ’bout’, ‘bowl’, ‘boy’, ‘bq281473’, ‘breakfast’, ‘brekky’, ‘bringing’, ‘bristolben’, ‘british’, ‘broke’, ‘bronny25’, ‘bruciebabe’, ‘bruh’, ‘brushyblues’, ‘bsilverstrim77’, ‘budlightbro’, ‘built’, ‘bullied’, ‘burcucekmece’, ‘burn’, ‘burning’, ‘burqua’, ‘bus’, ‘business’, ‘buttercupashby’, ‘butthead’, ‘button’, ‘buying’, ‘c2e2’, ‘call’, ‘cambrian_man’, ‘campagnebds’, ‘canned’, ‘cant’, ‘captive’, ‘car’, ‘card’, ‘case’, ‘cashing’, ‘castrating’, ‘catwalk’, ’cause’, ‘caved’, ‘ccot’, ‘cdnkhadija’, ‘celebrating’, ‘celine’, ‘cemcfarland’, ‘century’, ‘challenge’, ‘chance’, ‘change’, ‘changed’, ‘changing’, ‘channel’, ‘channel7’, ‘chaos’, ‘character’, ‘cheeseplus’, ‘chef’, ‘cheney’, ‘chicken’, ‘child’, ‘chloeandkelly’, ‘choice’, ‘christ’, ‘christian’, ‘christiansyazidis’, ‘christmas’, ‘christophheer52’, ‘chriswarcraft’, ‘chuck’, ‘chuckle’, ‘chuckpfarrer’, ‘churner’, ‘cia’, ‘cityofmandurah’, ‘civilian’, ‘cjsajulga’, ‘claim’, ‘claiming’, ‘clarify’, ‘clearly’, ‘close’, ‘colin’, ‘colins’, ‘colonelkickhead’, ‘come’, ‘comedian’, ‘coming’, ‘comment’, ‘committed’, ‘compensation’, ‘competition’, ‘complains’, ‘completely’, ‘compromise’, ‘concerned’, ‘conclusion’, ‘conducting’, ‘configuration’, ‘confuse’, ‘conserv_miss’, ‘consider’, ‘conspiracy’, ‘constant’, ‘constantly’, ‘constructed’, ‘content’, ‘contradicted’, ‘contributing’, ‘cook’, ‘cooked’, ‘cooking’, ‘cordial’, ‘cornflakes’, ‘could’, ‘country’, ‘couple’, ‘court’, ‘coworkers’, ‘coz’, ‘crabfest15’, ‘crap’, ‘crash’, ‘created’, ‘creates’, ‘creating’, ‘credibility’, ‘cretin’, ‘cringing’, ‘critiquing’, ‘culture’, ‘cunt’, ‘cup’, ‘curd’, ‘curious’, ‘customer’, ‘cut’, ‘cuz’, ‘cytheria’, ‘d20’, ‘daesh’, ‘damn’, ‘damnitscloudy’, ‘danhickey2199’, ‘danis’, ‘dankmtl’, ‘darchmare’, ‘darrenkopp’, ‘data’, ‘dave’, ‘davidjo52951945’, ‘davidsgallant’, ‘dc’, ‘dead’, ‘deal’, ‘death’, ‘declared’, ‘deconstruct’, ‘deconstructing’, ‘deduction’, ‘defend’, ‘definitely’, ‘delicate’, ‘demanded’, ‘dentist’, ‘describes’, ‘desertfox899’, ‘desire’, ‘desk’, ‘desperately’, ‘dessert’, ‘destined’, ‘destroys’, ‘deusexjuice’, ‘devops’, ‘dianh4’, ‘dick’, ‘dictatorship’, ‘didazahra’, ‘didnt’, ‘difference’, ‘different’, ‘direct’, ‘directhex’, ‘direction’, ‘directly’, ‘discerningmumin’, ‘discrimination’, ‘disgrace’, ‘disgusted’, ‘disgusting’, ‘dish’, ‘disheartened’, ‘disturbing’, ‘divisiveness’, ‘dkim’, ‘doammuslims’, ‘doctor’, ‘doesnt’, ‘dogging’, ‘doh’, ‘dolly’, ‘domestic’, ‘done’, ‘dont’, ‘double’, ‘draskos’, ‘drdisco_’, ‘dream’, ‘dreamer’, ‘dreaminpng’, ‘drive’, ‘drivemaneuveroperate’, ‘driver’, ‘driving’, ‘dropped’, ‘dry’, ‘dubhe80’, ‘duckiemcphee’, ‘dude’, ‘due’, ‘dumb’, ‘dumbest’, ‘dummy’, ‘dye’, ‘dying’, ‘ear’, ‘earth’, ‘easier’, ‘easy’, ‘eat’, ‘ebola’, ‘ebooks’, ‘economics’, ‘econtried’, ‘edgeofthesandbx’, ‘edible’, ‘efficient’, ‘effort’, ‘egypt’, ‘egyptian’, ‘either’, ‘eloisepeace’, ‘else’, ’email’, ’emilie’, ’empty’, ‘endless’, ‘enemy’, ‘engaging’, ‘enough’, ‘enslaved’, ‘entertaining’, ‘entree’, ‘ep’, ‘episode’, ‘equal’, ‘equalpay’, ‘espn’, ‘eternity’, ‘etsho127’, ‘evacuated’, ‘evans’, ‘even’, ‘eventually’, ‘ever’, ‘every’, ‘everyone’, ‘everything’, ‘evidence’, ‘evilsunbro’, ‘ew’, ‘except’, ‘excited’, ‘excuse’, ‘executed’, ‘expect’, ‘expedition’, ‘expertise’, ‘explain’, ‘explained’, ‘exposefalsehood’, ‘expression’, ‘exterminate’, ‘extreme’, ‘eye’, ‘eyesmkr’, ‘ezidi’, ‘ezidipress’, ‘ezidis’, ‘f3ew’, ‘face’, ‘faced’, ‘fact’, ‘faded’, ‘fail’, ‘fair’, ‘fan’, ‘fanatic’, ‘farbenstau’, ‘fascist’, ‘fat’, ‘feardept’, ‘feel’, ‘feeling’, ‘female’, ‘femfreefriday’, ‘feminism’, ‘feminist’, ‘fetch’, ‘fetish’, ‘fewer’, ‘fight’, ‘fighting’, ‘filthy’, ‘finalbroadcast’, ‘finally’, ‘find’, ‘fine’, ‘finger’, ‘first’, ‘flip’, ‘floss’, ‘flying’, ‘folk’, ‘followed’, ‘food’, ‘foodie_ben’, ‘foot’, ‘forbidden’, ‘force’, ‘forced’, ‘ford’, ‘form’, ‘formula’, ‘fought’, ‘foxnews’, ‘foxnewspolitics’, ‘free’, ‘freebsd’, ‘freebsdgirl’, ‘freebsdglri’, ‘freedom’, ‘freezer’, ‘fried’, ‘friend’, ‘fuck’, ‘fucking’, ‘fucktards’, ‘fun’, ‘funding’, ‘fyoudbag’, ‘gailsimone’, ‘gal’, ‘game’, ‘gamergate’, ‘gangraped’, ‘gap’, ‘garbage’, ‘garydlum’, ‘gaters’, ‘gatery’, ‘gator’, ‘gbabeuf’, ‘gbazov’, ‘gel’, ‘genius’, ‘genocide’, ‘get’, ‘getting’, ‘gg’, ‘ggautoblocker’, ‘ggreenwald’, ‘gilmore’, ‘girl’, ‘girlziplocked’, ‘give’, ‘given’, ‘giving’, ‘glad’, ‘glennf’, ‘glhf’, ‘glove’, ‘go’, ‘god’, ‘going’, ‘gon’, ‘gone’, ‘good’, ‘goodluck’, ‘goosenetworkusa’, ‘gosh’, ‘got’, ‘govt’, ‘grafana’, ‘grahamdavida’, ‘great’, ‘greater’, ‘greenlinerzjm’, ‘grin’, ‘gross’, ‘ground’, ‘group’, ‘grow’, ‘grown’, ‘guardian’, ‘gueensland’, ‘guess’, ‘guilt’, ‘gulf’, ‘gumboots’, ‘guy’, ‘hadith’, ‘haha’, ‘hair’, ‘halalflaws’, ‘hamas’, ‘hand’, ‘handle’, ‘happen’, ‘happened’, ‘happens’, ‘happy’, ‘happycampers’, ‘harassed’, ‘harassment’, ‘haroonstyles’, ‘harshly’, ‘hate’, ‘hated’, ‘hatefilled’, ‘hating’, ‘hatred’, ‘havent’, ‘hawaiinshirts’, ‘hayles_comet’, ‘hdmovieus’, ‘he’, ‘head’, ‘heart’, ‘hell’, ‘hello’, ‘help’, ‘helping’, ‘here’, ‘hide’, ‘hilarious’, ‘himat’, ‘hit’, ‘hitler’, ‘hockey’, ‘holder’, ‘home’, ‘homophobe’, ‘honestly’, ‘hope’, ‘hoping’, ‘hostage’, ‘however’, ‘hows’, ‘howtogetawaywithmurder’, ‘ht’, ‘http’, ‘httpstco6kgw1lejfr’, ‘httpstcoum5svjgazu’, ‘httpstcoxufwsigxfk’, ‘httpt’, ‘httptco’, ‘httptco1pl9gqrdp7’, ‘httptco4u’, ‘httptco5vsf5jroi6’, ‘httptco8xldnwbvzx’, ‘httptcobwr6ap0ooo’, ‘httptcocaxxus108l’, ‘httptcocbcr9u4fc9’, ‘httptcodajgdn1wy3’, ‘httptcoddecobanzx’, ‘httptcofdylhlkdcv’, ‘httptcoganrh4k87a’, ‘httptcogbvojnmbcv’, ‘httptcoglncgkuukp’, ‘httptcoh8f7n04q5o’, ‘httptcoltoxypkwww’, ‘httptcom4jcka5ir0’, ‘httptcom5j2tpksm5’, ‘httptcomdb4iu9whd’, ‘httptcomxuw3hz4tb’, ‘httptconleyqfnkyp’, ‘httptcontojwo4lnt’, ‘httptcoocsy7crghf’, ‘httptcopfs5zlkt07’, ‘httptcopnhzjrhhqr’, ‘httptcoq95ei17sua’, ‘httptcoqaa6bwi4pm’, ‘httptcoqmdsdtfvya’, ‘httptcoqv’, ‘httptcorbthvmh9jj’, ‘httptcospmvzcjj6o’, ‘httptcot65iytpvdk’, ‘httptcotfsodcowbx’, ‘httptcoumkitlb5h9’, ‘httptcouq19q6pnaq’, ‘httptcovanp6y7clr’, ‘httptcowhy3a8o33z’, ‘httptcoyspbfitztb’, ‘httptcozjbwagvnrg’, ‘hugged’, ‘human’, ‘humanistfury’, ‘humanity’, ‘hungrycampers’, ‘hw’, ‘hypatiadotca’, ‘hypocrite’, ‘ice’, ‘id’, ‘idea’, ‘ideaology’, ‘ideaor’, ‘ideology’, ‘idiot’, ‘idiotim’, ‘idontneedfeminism’, ‘ied’, ‘ignorance’, ‘ignorant’, ‘ignoring’, ‘ihatethiskid’, ‘ilivundrurbed’, ‘illegal’, ‘ilovebreakfast’, ‘iloveobama’, ‘imagine’, ‘imperialism’, ‘implies’, ‘important’, ‘info’, ‘information’, ‘input’, ‘instant’, ‘instead’, ‘insufferable’, ‘insulted’, ‘insulting’, ‘integration’, ‘intel’, ‘interact’, ‘interest’, ‘interesting’, ‘international’, ‘internet’, ‘intersection’, ‘intolerance’, ‘invented’, ‘iraq’, ‘iron’, ‘isi’, ‘isisutterly’, ‘islam’, ‘islamdefense’, ‘islamic’, ‘islamist’, ‘isnt’, ‘israel’, ‘israeliregime’, ‘issue’, ‘itll’, ‘itsbariecool’, ‘itsfact’, ‘ive’, ‘ivyexec’, ‘izrinhariri’, ‘jac’, ‘jealous’, ‘jeffreygoldberg’, ‘jennykutner’, ‘jeremiahfelt’, ‘jew’, ‘jhamby’, ‘jihadi_11’, ‘jihadis’, ‘jimcramer’, ‘jncatron’, ‘job’, ‘johncantile’, ‘johnnygjokaj’, ‘johnnyrejection’, ‘joke’, ‘journalist’, ‘judge’, ‘juliet777777’, ‘justdavidvideos’, ‘justhonest’, ‘justkelly_ok’, ‘kaitlynburnell’, ‘kamaluf’, ‘kardashian’, ‘kat’, ‘katampandre’, ‘katandandre’, ‘katie’, ‘katieandnikki’, ‘keep’, ‘keyboard’, ‘keynote’, ‘kid’, ‘kidding’, ‘kill’, ‘killa’, ‘killed’, ‘killerblondes’, ‘killing’, ‘kind’, ‘kirkuk’, ‘kmactane’, ‘knew’, ‘know’, ‘knowingly’, ‘kobane’, ‘kuffir’, ‘kurd’, ‘kurdish’, ‘lactualaloupe’, ‘lad’, ‘lady’, ‘lajouetreine’, ‘large’, ‘last’, ‘latest’, ‘laugh’, ‘launcher’, ‘law’, ‘lazy’, ‘lb’, ‘ldstarr18’, ‘le’, ‘lead’, ‘leaning’, ‘led’, ‘left’, ‘legitimately’, ‘lemon’, ‘lesson’, ‘letting’, ‘level’, ‘liar’, ‘liberate’, ‘libya’, ‘license’, ‘licking’, ‘lie’, ‘lied’, ‘life’, ‘light’, ‘like’, ‘likely’, ‘lilbeastunleash’, ‘lime’, ‘line’, ‘link’, ‘lipstick’, ‘lisamromano’, ‘lissasauras’, ‘listen’, ‘lithobolos’, ‘little’, ‘live’, ‘liver’, ‘lol’, ‘long’, ‘loo’, ‘look’, ‘looked’, ‘looking’, ‘lose’, ‘loser’, ‘lost’, ‘lot’, ‘love’, ‘low’, ‘lt3’, ‘lucaswj’, ‘luck’, ‘lunatic’, ‘lynnemcgranger’, ‘m_m_myers’, ‘maajidnawaz’, ‘mad’, ‘madasahatter_17’, ‘maddr11’, ‘made’, ‘magazine’, ‘magnus919’, ‘maja_stina’, ‘major’, ‘majority’, ‘make’, ‘making’, ‘mami_mermelada’, ‘man’, ‘manbabies’, ‘manu’, ‘manure’, ‘many’, ‘map’, ‘marc_leibowitz’, ‘market’, ‘markimbriaco’, ‘maroon’, ‘marriage’, ‘married’, ‘masontillidie’, ‘math’, ‘matter’, ‘mattstratton’, ‘mattybboi83’, ‘maxblumenthal’, ‘maxcaras’, ‘may’, ‘maybe’, ‘mccheesy904’, ‘mean’, ‘meaning’, ‘meatball’, ‘meatgirls’, ‘mechasauce’, ‘medium’, ‘meh’, ‘mehdirhasan’, ‘mellym09’, ‘melting’, ‘mention’, ‘menu’, ‘messed’, ‘microbrain’, ‘middle’, ‘mikeage’, ‘military’, ‘militia’, ‘minasmith64’, ‘minister’, ‘minority’, ‘misfitinchains’, ‘miskelayla’, ‘miss’, ‘missed’, ‘missing’, ‘mistertodd’, ‘mkr’, ‘mkr2015’, ‘mkrkat’, ‘mmmm’, ‘model’, ‘moderate’, ‘modern’, ‘mohammed’, ‘monday’, ‘month’, ‘moron’, ‘mosul’, ‘mouth’, ‘moving’, ‘much’, ‘mugnezee’, ‘multiple’, ‘murde’, ‘murder’, ‘murdered’, ‘murtaza’, ‘muslim’, ‘muslimtwo’, ‘mutilated’, ‘myersnfl’, ‘mykitchenrules’, ‘mystrongstate’, ‘nader_haq’, ‘naga’, ‘nainfidels’, ‘naminglisting’, ‘narîn’, ‘nasty’, ‘naturally’, ‘nazi’, ‘near’, ‘necessarily’, ‘need’, ‘needarethinkinformat’, ‘negated’, ‘negotiate’, ‘neilasaurus’, ‘never’, ‘new’, ‘new_babylonia’, ‘newscoverup’, ‘next’, ‘nice’, ‘nigelbigmeech’, ‘night’, ‘nikki’, ‘nobody’, ‘noise’, ‘nomcookiesnom’, ‘none’, ‘nooo’, ‘nope’, ‘notchrissmith’, ‘note’, ‘nothing’, ‘notsexist’, ‘novorossiyan’, ‘nscottg’, ‘number’, ‘number10gov’, ‘nytimes’, ‘obamacare’, ‘obamas’, ‘obsurfer84’, ‘obviously’, ‘occasion’, ‘offense’, ‘offensive’, ‘offering’, ‘oh’, ‘oil’, ‘oktar’, ‘old’, ‘oldgfatherclock’, ‘one’, ‘open’, ‘opener’, ‘opinion’, ‘opponent’, ‘opposed’, ‘optional’, ‘oreilly’, ‘org’, ‘origin’, ‘others’, ‘outside’, ‘overweight’, ‘owais00’, ‘p’, ‘p8952_’, ‘page’, ‘painful’, ‘pakistan’, ‘paknsave’, ‘palestine’, ‘pancake’, ‘paraketa’, ‘pardusxy’, ‘parent’, ‘paris’, ‘participate’, ‘passport’, ‘past’, ‘pastor’, ‘patrickosgood’, ‘pawarnhoff’, ‘pay’, ‘paying’, ‘pc’, ‘peace’, ‘peacenothate_’, ‘pedophile’, ‘pedophilia’, ‘peerworker’, ‘penalty’, ‘people’, ‘peopleschoice’, ‘perfect’, ‘period’, ‘perk’, ‘perl’, ‘personality’, ‘pervious’, ‘peymaneh123’, ‘phxken’, ‘pie’, ‘piece’, ‘pile’, ‘pilgars’, ‘pilot’, ‘pissing’, ‘pjnet’, ‘plane’, ‘planning’, ‘playing’, ‘playstations’, ‘please’, ‘pleasing’, ‘pnibbler’, ‘point’, ‘pole’, ‘police’, ‘political’, ‘politicalant’, ‘politics_pr’, ‘poor’, ‘poorly’, ‘population’, ‘portland’, ‘possible’, ‘possibly’, ‘post’, ‘posting’, ‘power’, ‘present’, ‘pressure’, ‘pretend’, ‘pretty’, ‘previous’, ‘price’, ‘prime’, ‘prisonersofwar’, ‘pro’, ‘probably’, ‘problem’, ‘problematic’, ‘producer’, ‘production’, ‘profile’, ‘project’, ‘promise’, ‘promo’, ‘promogirls’, ‘promoted’, ‘proof’, ‘propaganda’, ‘prophet’, ‘prospect’, ‘protecting’, ‘proudpatriot101’, ‘prove’, ‘provide’, ‘provision’, ‘psog’, ‘psogeco’, ‘psychbarakat’, ‘public’, ‘punch’, ‘purse’, ‘put’, ‘question’, ‘questionsformen’, ‘quietly’, ‘quit’, ‘quite’, ‘quran’, ‘r’, ‘race’, ‘racist’, ‘raised’, ‘random’, ‘randomhero30’, ‘raniakhalek’, ‘ransom’, ‘rape’, ‘raped’, ‘rapper’, ‘rapperguydmv’, ‘raqqa’, ‘raqqa_sl’, ‘rate’, ‘rather’, ‘ratio’, ‘ravenhuwolf’, ‘raw’, ‘rayyoosheh’, ‘react’, ‘read’, ‘readable’, ‘real’, ‘really’, ‘realryansipple’, ‘realtalk’, ‘reason’, ‘reasonably’, ‘rebel’, ‘recall’, ‘reckless’, ‘recognize’, ‘recommends’, ‘record’, ‘recruit’, ‘recuperate’, ‘redux’, ‘reevaluate’, ‘reference’, ‘referring’, ‘refine’, ‘regarding’, ‘regulation’, ‘rejected’, ‘relationship’, ‘release’, ‘religion’, ‘religious’, ‘relisha’, ‘reload’, ‘remember’, ‘reminded’, ‘rennie93’, ‘repeatedly’, ‘repetition’, ‘replacement’, ‘report’, ‘reputation’, ‘request’, ‘resorting’, ‘respond’, ‘restaurant’, ‘retreat’, ‘revolting’, ‘reza_rahman’, ‘rigged’, ‘right’, ‘rinehart33’, ‘rip’, ‘rjennromao’, ‘rkhayer’, ‘rkinglive2dance’, ‘rob’, ‘robbed’, ‘robert’, ‘robinriedstra’, ‘roll’, ‘roof’, ‘room’, ‘rooshv’, ‘rose’, ‘rotherham’, ‘rougek68′, ’round’, ‘routinely’, ‘rt’, ‘rts’, ‘rudawenglish’, ‘rudd’, ‘rude’, ‘rudoren’, ‘ruin’, ‘run’, ‘running’, ‘russian’, ‘said’, ‘saifullah666’, ‘sajid_fairooz’, ‘sake’, ‘salmon’, ‘salon’, ‘saltnburnem’, ‘salty’, ‘samkitsengupta’, ‘santa’, ‘sarah_jane666’, ‘sas’, ‘satire’, ‘saudi’, ‘sausage’, ‘save’, ‘saw’, ‘say’, ‘scared’, ‘schmeezi’, ‘school’, ‘score’, ‘scratch’, ‘screencaps’, ‘script’, ‘scripted’, ‘scroll’, ‘scum’, ‘season’, ‘see’, ‘seen’, ‘segment’, ‘self’, ‘selfies’, ‘selling’, ‘sellout’, ‘semite’, ‘semzyxx’, ‘sensitive’, ‘sent’, ‘serious’, ‘seriously’, ‘serlasco’, ‘serve’, ‘served’, ‘service’, ‘serving’, ‘set’, ‘setup’, ‘sevilzadeh’, ‘sex’, ‘sexhonest’, ‘sexism’, ‘sexist’, ‘sexually’, ‘shami_is_back’, ‘shaz’, ‘shell’, ‘shermertron’, ‘sherri’, ‘shia’, ‘shingal’, ‘shirt’, ‘shit’, ‘shoe0nhead’, ‘short’, ‘shovel’, ‘show’, ‘shower’, ‘shred’, ‘shut’, ‘sick’, ‘side’, ‘sighhhh’, ‘simpson’, ‘since’, ‘singer’, ‘sinjar’, ‘sirgoldenrod’, ‘six’, ‘skank’, ‘slagkick’, ‘slap’, ‘slave’, ‘slaved’, ‘sleep’, ‘sleeping’, ‘slide’, ‘slightly’, ‘sloshedtrain2’, ‘slow’, ‘smack’, ‘smackem’, ‘small’, ‘smarter’, ‘smash’, ‘sold’, ‘soldier’, ‘someone’, ‘something’, ‘sometimes’, ‘somewhat’, ‘soon’, ‘sorbent’, ‘sorbet’, ‘sorry’, ‘sorrynotsorry’, ‘sound’, ‘source’, ‘space’, ‘spacekatgal’, ‘spacequeentbh’, ‘spam’, ‘spatchcock’, ‘speak’, ‘speaking’, ‘speech’, ‘spiritual’, ‘spoiled’, ‘sport’, ‘sports2inflatio’, ‘sputnik’, ‘srhbutts’, ‘stalin’, ‘stand’, ‘standard’, ‘standing’, ‘starius’, ‘started’, ‘starting’, ‘state’, ‘statistic’, ‘stats’, ‘stay’, ‘stayed’, ‘staying’, ‘step’, ‘steve’, ‘stiff’, ‘still’, ‘stood’, ‘stop’, ‘stopping’, ‘stopwadhwa2015’, ‘story’, ‘strategically’, ‘streaming’, ‘stretch’, ‘strike’, ‘strong’, ‘struggle’, ‘student’, ‘stuff’, ‘stupid’, ‘stylist’, ‘subject’, ‘subtle’, ‘success’, ‘suck’, ‘sudixitca’, ‘suicide’, ‘sumersloan’, ‘super’, ‘superior’, ‘support’, ‘supported’, ‘sure’, ‘surgery’, ‘swallow’, ‘swiftonsecurity’, ‘switching’, ‘syazlicious’, ‘syria’, ‘systemic’, ‘tacky’, ‘taken’, ‘taking’, ‘tal’, ‘taliban’, ‘talk’, ‘talladega’, ‘taqiyya’, ‘tarah’, ‘tart’, ‘tasteless’, ‘tatibresolin’, ‘tbh’, ‘tbielawa’, ‘tcot’, ‘teach’, ‘teaching’, ‘team’, ‘tell’, ‘telling’, ‘tempting’, ‘terrible’, ‘terror’, ‘terrorism’, ‘terrorist’, ‘testicle’, ‘texasarlington’, ‘thanks’, ‘thatll’, ‘thats’, ‘theckman’, ‘thedoubleclicks’, ‘thegeek_chick’, ‘thegoodguysau’, ‘thelindsayellis’, ‘thelmasleaze’, ‘themirai’, ‘themselvespffft’, ‘themuslimguy’, ‘thequinnspiracy’, ‘there’, ‘theyre’, ‘thing’, ‘think’, ‘thinking’, ‘third’, ‘thought’, ‘threw’, ‘throw’, ‘tied’, ‘tim’, ‘time’, ‘timespan’, ‘tiny’, ‘tip’, ‘tnr’, ‘tobyrobertbull’, ‘today’, ‘todayreal’, ‘told’, ‘tolerate’, ‘tomato’, ‘tonight’, ‘toodles’, ‘tool’, ‘top’, ‘total’, ‘train’, ‘transic_nyc’, ‘translator’, ‘treating’, ‘trend’, ‘tried’, ‘tripple’, ‘trolley’, ‘troop’, ‘trophy’, ‘truaemusic’, ‘truly’, ‘try’, ‘trying’, ‘turf’, ‘turk’, ‘tv’, ‘tw’, ‘tweet’, ‘twist’, ‘twista202’, ‘twitter’, ‘two’, ‘typed’, ‘typically’, ‘typo’, ‘u’, ‘ugly’, ‘ukraine’, ‘ukrainian’, ‘ultrafundamentalist’, ‘unacceptable’, ‘unapologetic’, ‘unashamed’, ‘uncalled’, ‘understand’, ‘understands’, ‘unfair’, ‘uninvolved’, ‘university’, ‘update’, ‘uplay’, ‘upon’, ‘use’, ‘user’, ‘username’, ‘usually’, ‘valenti’, ‘value’, ‘vandaliser’, ‘vc’, ‘vcs’, ‘venereveritas13’, ‘venomous9’, ‘versa’, ‘verse’, ‘vex0rian’, ‘via’, ‘vice’, ‘victim’, ‘victorymonk’, ‘video’, ‘videobeautiful’, ‘violence’, ‘voice’, ‘vonta624’, ‘vote’, ‘voted’, ‘w’, ‘wadhwa’, ‘wait’, ‘waiting’, ‘wakeuplibsgtjoenbc’, ‘walk’, ‘wan’, ‘wanted’, ‘warriorsialkot’, ‘washed’, ‘washingtonpost’, ‘wasnt’, ‘watan71969’, ‘watch’, ‘watched’, ‘watching’, ‘way’, ‘week’, ‘weekly’, ‘well’, ‘went’, ‘werent’, ‘west’, ‘western’, ‘wetsprocket’, ‘wheat’, ‘wheel’, ‘whereisyourdignity’, ‘whether’, ‘whiny’, ‘white’, ‘whiteblack’, ‘whitening’, ‘whole’, ‘wi’, ‘wife’, ‘win’, ‘wing’, ‘wish’, ‘witch_sniffer’, ‘without’, ‘witty’, ‘wizardryofozil’, ‘wks’, ‘wnba’, ‘wnyc’, ‘wocracial’, ‘woman’, ‘womenagainstfeminism’, ‘womeninterpret’, ‘word’, ‘work’, ‘worker’, ‘working’, ‘worse’, ‘worst’, ‘would’, ‘wouldnt’, ‘wouldve’, ‘wow’, ‘wrecking’, ‘write’, ‘writer’, ‘writing’, ‘wrong’, ‘xmjee’, ‘yall’, ‘yawn’, ‘yeah’, ‘year’, ‘yes’, ‘yesallwomen’, ‘yesyouresexist’, ‘yet’, ‘yield’, ‘youd’, ‘youll’, ‘young’, ‘youre’, ‘yousufpoosuf’, ‘youtube’, ‘ypg’, ‘yum’, ‘zaibatsunews’, ‘zene55’, ‘zero’, ‘zython86’]
{‘0′: 0, ’06jank’: 1, ‘0xjared’: 2, ‘1’: 3, ’11’: 4, ’12’: 5, ’14’: 6, ‘1400’: 7, ’15’: 8, ’17’: 9, ‘1shadeofritch’: 10, ‘2’: 11, ‘2027279099’: 12, ‘22000’: 13, ‘2ndbestidiot’: 14, ‘3’: 15, ‘3outof10’: 16, ‘4’: 17, ’44’: 18, ’47’: 19, ‘4x’: 20, ‘5’: 21, ‘6’: 22, ‘7’: 23, ’80’: 24, ‘800’: 25, ’90’: 26, ‘911’: 27, ’98halima’: 28, ’99’: 29, ‘[PAD]’: 30, ‘[UNKOWN]’: 31, ‘__chris33__’: 32, ‘_marisajane’: 33, ‘abducted’: 34, ‘abdul_a95’: 35, ‘aberration’: 36, ‘ability’: 37, ‘ablahad’: 38, ‘able’: 39, ‘absolutely’: 40, ‘abuse’: 41, ‘ac360’: 42, ‘accept’: 43, ‘acceptable’: 44, ‘accepted’: 45, ‘accessorizing’: 46, ‘account’: 47, ‘achieve’: 48, ‘across’: 49, ‘actoractress’: 50, ‘actually’: 51, ‘adjective’: 52, ‘admits’: 53, ‘adult’: 54, ‘afar’: 55, ‘afraid’: 56, ‘ago’: 57, ‘agree’: 58, ‘ahahahaha’: 59, ‘air’: 60, ‘airstrikes’: 61, ‘aisle’: 62, ‘ajwatamr’: 63, ‘akheemv’: 64, ‘aledthomas22’: 65, ‘alihadi68’: 66, ‘alihashem_tv’: 67, ‘alive’: 68, ‘all_hailcaesar’: 69, ‘allegedly’: 70, ‘allstatejackie’: 71, ‘ally’: 72, ‘along’: 73, ‘already’: 74, ‘alternet’: 75, ‘amaze’: 76, ‘amazing’: 77, ‘amazingly’: 78, ‘amberhasalamb’: 79, ‘ameliagreenhall’: 80, ‘american’: 81, ‘amohedin’: 82, ‘amp’: 83, ‘amymek’: 84, ‘anasmechch’: 85, ‘andcamping’: 86, ‘andre’: 87, ‘angelemichelle’: 88, ‘anitaingle’: 89, ‘annoying’: 90, ‘another’: 91, ‘answer’: 92, ‘anti’: 93, ‘antiharassment’: 94, ‘antizholim’: 95, ‘anyone’: 96, ‘anything’: 97, ‘apartheid’: 98, ‘arab_fury’: 99, ‘arabic’: 100, ‘arabthomness’: 101, ‘archangel_dux’: 102, ‘arena’: 103, ‘argh’: 104, ‘argonblue’: 105, ‘argument’: 106, ‘armedyoure’: 107, ‘armenian’: 108, ‘armpit’: 109, ‘army’: 110, ‘arquette’: 111, ‘article’: 112, ‘asad’: 113, ‘asem_1994’: 114, ‘ask’: 115, ‘asked’: 116, ‘askgoog’: 117, ‘askhermore’: 118, ‘asshole’: 119, ‘astounding’: 120, ‘athlete’: 121, ‘attack’: 122, ‘attacked’: 123, ‘attacking’: 124, ‘attracted’: 125, ‘attractive’: 126, ‘attributing’: 127, ‘attrocities’: 128, ‘auntysoapbox’: 129, ‘australia’: 130, ‘away’: 131, ‘awful’: 132, ‘awkward’: 133, ‘awww’: 134, ‘b’: 135, ‘babbling’: 136, ‘back’: 137, ‘backing’: 138, ‘backwards’: 139, ‘bad’: 140, ‘bag’: 141, ‘baghdad’: 142, ‘bahai144’: 143, ‘bamboozled’: 144, ‘banter’: 145, ‘barackobama’: 146, ‘basically’: 147, ‘bastendorfgames’: 148, ‘batchelorshow’: 149, ‘bcz’: 150, ‘beadsland’: 151, ‘beat’: 152, ‘beautiful’: 153, ‘beautifula’: 154, ‘beavis’: 155, ‘beckles’: 156, ‘begun’: 157, ‘behaving’: 158, ‘behead’: 159, ‘behind’: 160, ‘belief’: 161, ‘believe’: 162, ‘benkuchera’: 163, ‘best’: 164, ‘better’: 165, ‘beyond’: 166, ‘bgs’: 167, ‘bhamdailynews’: 168, ‘biebervalue’: 169, ‘big’: 170, ‘bigot’: 171, ‘bigotry’: 172, ‘bigtime’: 173, ‘bilalighumman’: 174, ‘bill’: 175, ‘bimbo’: 176, ‘bimbolines’: 177, ‘bit’: 178, ‘bitch’: 179, ‘bixs’: 180, ‘blabber’: 181, ‘blackopal80’: 182, ‘block’: 183, ‘blocked’: 184, ‘bloke’: 185, ‘blonde’: 186, ‘blondemoment’: 187, ‘blow’: 188, ‘blumenthal’: 189, ‘body’: 190, ‘book’: 191, ‘booted’: 192, ‘boring’: 193, ‘bos’: 194, ‘bottle’: 195, ‘bought’: 196, ’bout’: 197, ‘bowl’: 198, ‘boy’: 199, ‘bq281473’: 200, ‘breakfast’: 201, ‘brekky’: 202, ‘bringing’: 203, ‘bristolben’: 204, ‘british’: 205, ‘broke’: 206, ‘bronny25’: 207, ‘bruciebabe’: 208, ‘bruh’: 209, ‘brushyblues’: 210, ‘bsilverstrim77’: 211, ‘budlightbro’: 212, ‘built’: 213, ‘bullied’: 214, ‘burcucekmece’: 215, ‘burn’: 216, ‘burning’: 217, ‘burqua’: 218, ‘bus’: 219, ‘business’: 220, ‘buttercupashby’: 221, ‘butthead’: 222, ‘button’: 223, ‘buying’: 224, ‘c2e2’: 225, ‘call’: 226, ‘cambrian_man’: 227, ‘campagnebds’: 228, ‘canned’: 229, ‘cant’: 230, ‘captive’: 231, ‘car’: 232, ‘card’: 233, ‘case’: 234, ‘cashing’: 235, ‘castrating’: 236, ‘catwalk’: 237, ’cause’: 238, ‘caved’: 239, ‘ccot’: 240, ‘cdnkhadija’: 241, ‘celebrating’: 242, ‘celine’: 243, ‘cemcfarland’: 244, ‘century’: 245, ‘challenge’: 246, ‘chance’: 247, ‘change’: 248, ‘changed’: 249, ‘changing’: 250, ‘channel’: 251, ‘channel7’: 252, ‘chaos’: 253, ‘character’: 254, ‘cheeseplus’: 255, ‘chef’: 256, ‘cheney’: 257, ‘chicken’: 258, ‘child’: 259, ‘chloeandkelly’: 260, ‘choice’: 261, ‘christ’: 262, ‘christian’: 263, ‘christiansyazidis’: 264, ‘christmas’: 265, ‘christophheer52’: 266, ‘chriswarcraft’: 267, ‘chuck’: 268, ‘chuckle’: 269, ‘chuckpfarrer’: 270, ‘churner’: 271, ‘cia’: 272, ‘cityofmandurah’: 273, ‘civilian’: 274, ‘cjsajulga’: 275, ‘claim’: 276, ‘claiming’: 277, ‘clarify’: 278, ‘clearly’: 279, ‘close’: 280, ‘colin’: 281, ‘colins’: 282, ‘colonelkickhead’: 283, ‘come’: 284, ‘comedian’: 285, ‘coming’: 286, ‘comment’: 287, ‘committed’: 288, ‘compensation’: 289, ‘competition’: 290, ‘complains’: 291, ‘completely’: 292, ‘compromise’: 293, ‘concerned’: 294, ‘conclusion’: 295, ‘conducting’: 296, ‘configuration’: 297, ‘confuse’: 298, ‘conserv_miss’: 299, ‘consider’: 300, ‘conspiracy’: 301, ‘constant’: 302, ‘constantly’: 303, ‘constructed’: 304, ‘content’: 305, ‘contradicted’: 306, ‘contributing’: 307, ‘cook’: 308, ‘cooked’: 309, ‘cooking’: 310, ‘cordial’: 311, ‘cornflakes’: 312, ‘could’: 313, ‘country’: 314, ‘couple’: 315, ‘court’: 316, ‘coworkers’: 317, ‘coz’: 318, ‘crabfest15’: 319, ‘crap’: 320, ‘crash’: 321, ‘created’: 322, ‘creates’: 323, ‘creating’: 324, ‘credibility’: 325, ‘cretin’: 326, ‘cringing’: 327, ‘critiquing’: 328, ‘culture’: 329, ‘cunt’: 330, ‘cup’: 331, ‘curd’: 332, ‘curious’: 333, ‘customer’: 334, ‘cut’: 335, ‘cuz’: 336, ‘cytheria’: 337, ‘d20’: 338, ‘daesh’: 339, ‘damn’: 340, ‘damnitscloudy’: 341, ‘danhickey2199’: 342, ‘danis’: 343, ‘dankmtl’: 344, ‘darchmare’: 345, ‘darrenkopp’: 346, ‘data’: 347, ‘dave’: 348, ‘davidjo52951945’: 349, ‘davidsgallant’: 350, ‘dc’: 351, ‘dead’: 352, ‘deal’: 353, ‘death’: 354, ‘declared’: 355, ‘deconstruct’: 356, ‘deconstructing’: 357, ‘deduction’: 358, ‘defend’: 359, ‘definitely’: 360, ‘delicate’: 361, ‘demanded’: 362, ‘dentist’: 363, ‘describes’: 364, ‘desertfox899’: 365, ‘desire’: 366, ‘desk’: 367, ‘desperately’: 368, ‘dessert’: 369, ‘destined’: 370, ‘destroys’: 371, ‘deusexjuice’: 372, ‘devops’: 373, ‘dianh4’: 374, ‘dick’: 375, ‘dictatorship’: 376, ‘didazahra’: 377, ‘didnt’: 378, ‘difference’: 379, ‘different’: 380, ‘direct’: 381, ‘directhex’: 382, ‘direction’: 383, ‘directly’: 384, ‘discerningmumin’: 385, ‘discrimination’: 386, ‘disgrace’: 387, ‘disgusted’: 388, ‘disgusting’: 389, ‘dish’: 390, ‘disheartened’: 391, ‘disturbing’: 392, ‘divisiveness’: 393, ‘dkim’: 394, ‘doammuslims’: 395, ‘doctor’: 396, ‘doesnt’: 397, ‘dogging’: 398, ‘doh’: 399, ‘dolly’: 400, ‘domestic’: 401, ‘done’: 402, ‘dont’: 403, ‘double’: 404, ‘draskos’: 405, ‘drdisco_’: 406, ‘dream’: 407, ‘dreamer’: 408, ‘dreaminpng’: 409, ‘drive’: 410, ‘drivemaneuveroperate’: 411, ‘driver’: 412, ‘driving’: 413, ‘dropped’: 414, ‘dry’: 415, ‘dubhe80’: 416, ‘duckiemcphee’: 417, ‘dude’: 418, ‘due’: 419, ‘dumb’: 420, ‘dumbest’: 421, ‘dummy’: 422, ‘dye’: 423, ‘dying’: 424, ‘ear’: 425, ‘earth’: 426, ‘easier’: 427, ‘easy’: 428, ‘eat’: 429, ‘ebola’: 430, ‘ebooks’: 431, ‘economics’: 432, ‘econtried’: 433, ‘edgeofthesandbx’: 434, ‘edible’: 435, ‘efficient’: 436, ‘effort’: 437, ‘egypt’: 438, ‘egyptian’: 439, ‘either’: 440, ‘eloisepeace’: 441, ‘else’: 442, ’email’: 443, ’emilie’: 444, ’empty’: 445, ‘endless’: 446, ‘enemy’: 447, ‘engaging’: 448, ‘enough’: 449, ‘enslaved’: 450, ‘entertaining’: 451, ‘entree’: 452, ‘ep’: 453, ‘episode’: 454, ‘equal’: 455, ‘equalpay’: 456, ‘espn’: 457, ‘eternity’: 458, ‘etsho127’: 459, ‘evacuated’: 460, ‘evans’: 461, ‘even’: 462, ‘eventually’: 463, ‘ever’: 464, ‘every’: 465, ‘everyone’: 466, ‘everything’: 467, ‘evidence’: 468, ‘evilsunbro’: 469, ‘ew’: 470, ‘except’: 471, ‘excited’: 472, ‘excuse’: 473, ‘executed’: 474, ‘expect’: 475, ‘expedition’: 476, ‘expertise’: 477, ‘explain’: 478, ‘explained’: 479, ‘exposefalsehood’: 480, ‘expression’: 481, ‘exterminate’: 482, ‘extreme’: 483, ‘eye’: 484, ‘eyesmkr’: 485, ‘ezidi’: 486, ‘ezidipress’: 487, ‘ezidis’: 488, ‘f3ew’: 489, ‘face’: 490, ‘faced’: 491, ‘fact’: 492, ‘faded’: 493, ‘fail’: 494, ‘fair’: 495, ‘fan’: 496, ‘fanatic’: 497, ‘farbenstau’: 498, ‘fascist’: 499, ‘fat’: 500, ‘feardept’: 501, ‘feel’: 502, ‘feeling’: 503, ‘female’: 504, ‘femfreefriday’: 505, ‘feminism’: 506, ‘feminist’: 507, ‘fetch’: 508, ‘fetish’: 509, ‘fewer’: 510, ‘fight’: 511, ‘fighting’: 512, ‘filthy’: 513, ‘finalbroadcast’: 514, ‘finally’: 515, ‘find’: 516, ‘fine’: 517, ‘finger’: 518, ‘first’: 519, ‘flip’: 520, ‘floss’: 521, ‘flying’: 522, ‘folk’: 523, ‘followed’: 524, ‘food’: 525, ‘foodie_ben’: 526, ‘foot’: 527, ‘forbidden’: 528, ‘force’: 529, ‘forced’: 530, ‘ford’: 531, ‘form’: 532, ‘formula’: 533, ‘fought’: 534, ‘foxnews’: 535, ‘foxnewspolitics’: 536, ‘free’: 537, ‘freebsd’: 538, ‘freebsdgirl’: 539, ‘freebsdglri’: 540, ‘freedom’: 541, ‘freezer’: 542, ‘fried’: 543, ‘friend’: 544, ‘fuck’: 545, ‘fucking’: 546, ‘fucktards’: 547, ‘fun’: 548, ‘funding’: 549, ‘fyoudbag’: 550, ‘gailsimone’: 551, ‘gal’: 552, ‘game’: 553, ‘gamergate’: 554, ‘gangraped’: 555, ‘gap’: 556, ‘garbage’: 557, ‘garydlum’: 558, ‘gaters’: 559, ‘gatery’: 560, ‘gator’: 561, ‘gbabeuf’: 562, ‘gbazov’: 563, ‘gel’: 564, ‘genius’: 565, ‘genocide’: 566, ‘get’: 567, ‘getting’: 568, ‘gg’: 569, ‘ggautoblocker’: 570, ‘ggreenwald’: 571, ‘gilmore’: 572, ‘girl’: 573, ‘girlziplocked’: 574, ‘give’: 575, ‘given’: 576, ‘giving’: 577, ‘glad’: 578, ‘glennf’: 579, ‘glhf’: 580, ‘glove’: 581, ‘go’: 582, ‘god’: 583, ‘going’: 584, ‘gon’: 585, ‘gone’: 586, ‘good’: 587, ‘goodluck’: 588, ‘goosenetworkusa’: 589, ‘gosh’: 590, ‘got’: 591, ‘govt’: 592, ‘grafana’: 593, ‘grahamdavida’: 594, ‘great’: 595, ‘greater’: 596, ‘greenlinerzjm’: 597, ‘grin’: 598, ‘gross’: 599, ‘ground’: 600, ‘group’: 601, ‘grow’: 602, ‘grown’: 603, ‘guardian’: 604, ‘gueensland’: 605, ‘guess’: 606, ‘guilt’: 607, ‘gulf’: 608, ‘gumboots’: 609, ‘guy’: 610, ‘hadith’: 611, ‘haha’: 612, ‘hair’: 613, ‘halalflaws’: 614, ‘hamas’: 615, ‘hand’: 616, ‘handle’: 617, ‘happen’: 618, ‘happened’: 619, ‘happens’: 620, ‘happy’: 621, ‘happycampers’: 622, ‘harassed’: 623, ‘harassment’: 624, ‘haroonstyles’: 625, ‘harshly’: 626, ‘hate’: 627, ‘hated’: 628, ‘hatefilled’: 629, ‘hating’: 630, ‘hatred’: 631, ‘havent’: 632, ‘hawaiinshirts’: 633, ‘hayles_comet’: 634, ‘hdmovieus’: 635, ‘he’: 636, ‘head’: 637, ‘heart’: 638, ‘hell’: 639, ‘hello’: 640, ‘help’: 641, ‘helping’: 642, ‘here’: 643, ‘hide’: 644, ‘hilarious’: 645, ‘himat’: 646, ‘hit’: 647, ‘hitler’: 648, ‘hockey’: 649, ‘holder’: 650, ‘home’: 651, ‘homophobe’: 652, ‘honestly’: 653, ‘hope’: 654, ‘hoping’: 655, ‘hostage’: 656, ‘however’: 657, ‘hows’: 658, ‘howtogetawaywithmurder’: 659, ‘ht’: 660, ‘http’: 661, ‘httpstco6kgw1lejfr’: 662, ‘httpstcoum5svjgazu’: 663, ‘httpstcoxufwsigxfk’: 664, ‘httpt’: 665, ‘httptco’: 666, ‘httptco1pl9gqrdp7’: 667, ‘httptco4u’: 668, ‘httptco5vsf5jroi6’: 669, ‘httptco8xldnwbvzx’: 670, ‘httptcobwr6ap0ooo’: 671, ‘httptcocaxxus108l’: 672, ‘httptcocbcr9u4fc9’: 673, ‘httptcodajgdn1wy3’: 674, ‘httptcoddecobanzx’: 675, ‘httptcofdylhlkdcv’: 676, ‘httptcoganrh4k87a’: 677, ‘httptcogbvojnmbcv’: 678, ‘httptcoglncgkuukp’: 679, ‘httptcoh8f7n04q5o’: 680, ‘httptcoltoxypkwww’: 681, ‘httptcom4jcka5ir0’: 682, ‘httptcom5j2tpksm5’: 683, ‘httptcomdb4iu9whd’: 684, ‘httptcomxuw3hz4tb’: 685, ‘httptconleyqfnkyp’: 686, ‘httptcontojwo4lnt’: 687, ‘httptcoocsy7crghf’: 688, ‘httptcopfs5zlkt07’: 689, ‘httptcopnhzjrhhqr’: 690, ‘httptcoq95ei17sua’: 691, ‘httptcoqaa6bwi4pm’: 692, ‘httptcoqmdsdtfvya’: 693, ‘httptcoqv’: 694, ‘httptcorbthvmh9jj’: 695, ‘httptcospmvzcjj6o’: 696, ‘httptcot65iytpvdk’: 697, ‘httptcotfsodcowbx’: 698, ‘httptcoumkitlb5h9’: 699, ‘httptcouq19q6pnaq’: 700, ‘httptcovanp6y7clr’: 701, ‘httptcowhy3a8o33z’: 702, ‘httptcoyspbfitztb’: 703, ‘httptcozjbwagvnrg’: 704, ‘hugged’: 705, ‘human’: 706, ‘humanistfury’: 707, ‘humanity’: 708, ‘hungrycampers’: 709, ‘hw’: 710, ‘hypatiadotca’: 711, ‘hypocrite’: 712, ‘ice’: 713, ‘id’: 714, ‘idea’: 715, ‘ideaology’: 716, ‘ideaor’: 717, ‘ideology’: 718, ‘idiot’: 719, ‘idiotim’: 720, ‘idontneedfeminism’: 721, ‘ied’: 722, ‘ignorance’: 723, ‘ignorant’: 724, ‘ignoring’: 725, ‘ihatethiskid’: 726, ‘ilivundrurbed’: 727, ‘illegal’: 728, ‘ilovebreakfast’: 729, ‘iloveobama’: 730, ‘imagine’: 731, ‘imperialism’: 732, ‘implies’: 733, ‘important’: 734, ‘info’: 735, ‘information’: 736, ‘input’: 737, ‘instant’: 738, ‘instead’: 739, ‘insufferable’: 740, ‘insulted’: 741, ‘insulting’: 742, ‘integration’: 743, ‘intel’: 744, ‘interact’: 745, ‘interest’: 746, ‘interesting’: 747, ‘international’: 748, ‘internet’: 749, ‘intersection’: 750, ‘intolerance’: 751, ‘invented’: 752, ‘iraq’: 753, ‘iron’: 754, ‘isi’: 755, ‘isisutterly’: 756, ‘islam’: 757, ‘islamdefense’: 758, ‘islamic’: 759, ‘islamist’: 760, ‘isnt’: 761, ‘israel’: 762, ‘israeliregime’: 763, ‘issue’: 764, ‘itll’: 765, ‘itsbariecool’: 766, ‘itsfact’: 767, ‘ive’: 768, ‘ivyexec’: 769, ‘izrinhariri’: 770, ‘jac’: 771, ‘jealous’: 772, ‘jeffreygoldberg’: 773, ‘jennykutner’: 774, ‘jeremiahfelt’: 775, ‘jew’: 776, ‘jhamby’: 777, ‘jihadi_11’: 778, ‘jihadis’: 779, ‘jimcramer’: 780, ‘jncatron’: 781, ‘job’: 782, ‘johncantile’: 783, ‘johnnygjokaj’: 784, ‘johnnyrejection’: 785, ‘joke’: 786, ‘journalist’: 787, ‘judge’: 788, ‘juliet777777’: 789, ‘justdavidvideos’: 790, ‘justhonest’: 791, ‘justkelly_ok’: 792, ‘kaitlynburnell’: 793, ‘kamaluf’: 794, ‘kardashian’: 795, ‘kat’: 796, ‘katampandre’: 797, ‘katandandre’: 798, ‘katie’: 799, ‘katieandnikki’: 800, ‘keep’: 801, ‘keyboard’: 802, ‘keynote’: 803, ‘kid’: 804, ‘kidding’: 805, ‘kill’: 806, ‘killa’: 807, ‘killed’: 808, ‘killerblondes’: 809, ‘killing’: 810, ‘kind’: 811, ‘kirkuk’: 812, ‘kmactane’: 813, ‘knew’: 814, ‘know’: 815, ‘knowingly’: 816, ‘kobane’: 817, ‘kuffir’: 818, ‘kurd’: 819, ‘kurdish’: 820, ‘lactualaloupe’: 821, ‘lad’: 822, ‘lady’: 823, ‘lajouetreine’: 824, ‘large’: 825, ‘last’: 826, ‘latest’: 827, ‘laugh’: 828, ‘launcher’: 829, ‘law’: 830, ‘lazy’: 831, ‘lb’: 832, ‘ldstarr18’: 833, ‘le’: 834, ‘lead’: 835, ‘leaning’: 836, ‘led’: 837, ‘left’: 838, ‘legitimately’: 839, ‘lemon’: 840, ‘lesson’: 841, ‘letting’: 842, ‘level’: 843, ‘liar’: 844, ‘liberate’: 845, ‘libya’: 846, ‘license’: 847, ‘licking’: 848, ‘lie’: 849, ‘lied’: 850, ‘life’: 851, ‘light’: 852, ‘like’: 853, ‘likely’: 854, ‘lilbeastunleash’: 855, ‘lime’: 856, ‘line’: 857, ‘link’: 858, ‘lipstick’: 859, ‘lisamromano’: 860, ‘lissasauras’: 861, ‘listen’: 862, ‘lithobolos’: 863, ‘little’: 864, ‘live’: 865, ‘liver’: 866, ‘lol’: 867, ‘long’: 868, ‘loo’: 869, ‘look’: 870, ‘looked’: 871, ‘looking’: 872, ‘lose’: 873, ‘loser’: 874, ‘lost’: 875, ‘lot’: 876, ‘love’: 877, ‘low’: 878, ‘lt3’: 879, ‘lucaswj’: 880, ‘luck’: 881, ‘lunatic’: 882, ‘lynnemcgranger’: 883, ‘m_m_myers’: 884, ‘maajidnawaz’: 885, ‘mad’: 886, ‘madasahatter_17’: 887, ‘maddr11’: 888, ‘made’: 889, ‘magazine’: 890, ‘magnus919’: 891, ‘maja_stina’: 892, ‘major’: 893, ‘majority’: 894, ‘make’: 895, ‘making’: 896, ‘mami_mermelada’: 897, ‘man’: 898, ‘manbabies’: 899, ‘manu’: 900, ‘manure’: 901, ‘many’: 902, ‘map’: 903, ‘marc_leibowitz’: 904, ‘market’: 905, ‘markimbriaco’: 906, ‘maroon’: 907, ‘marriage’: 908, ‘married’: 909, ‘masontillidie’: 910, ‘math’: 911, ‘matter’: 912, ‘mattstratton’: 913, ‘mattybboi83’: 914, ‘maxblumenthal’: 915, ‘maxcaras’: 916, ‘may’: 917, ‘maybe’: 918, ‘mccheesy904’: 919, ‘mean’: 920, ‘meaning’: 921, ‘meatball’: 922, ‘meatgirls’: 923, ‘mechasauce’: 924, ‘medium’: 925, ‘meh’: 926, ‘mehdirhasan’: 927, ‘mellym09’: 928, ‘melting’: 929, ‘mention’: 930, ‘menu’: 931, ‘messed’: 932, ‘microbrain’: 933, ‘middle’: 934, ‘mikeage’: 935, ‘military’: 936, ‘militia’: 937, ‘minasmith64’: 938, ‘minister’: 939, ‘minority’: 940, ‘misfitinchains’: 941, ‘miskelayla’: 942, ‘miss’: 943, ‘missed’: 944, ‘missing’: 945, ‘mistertodd’: 946, ‘mkr’: 947, ‘mkr2015’: 948, ‘mkrkat’: 949, ‘mmmm’: 950, ‘model’: 951, ‘moderate’: 952, ‘modern’: 953, ‘mohammed’: 954, ‘monday’: 955, ‘month’: 956, ‘moron’: 957, ‘mosul’: 958, ‘mouth’: 959, ‘moving’: 960, ‘much’: 961, ‘mugnezee’: 962, ‘multiple’: 963, ‘murde’: 964, ‘murder’: 965, ‘murdered’: 966, ‘murtaza’: 967, ‘muslim’: 968, ‘muslimtwo’: 969, ‘mutilated’: 970, ‘myersnfl’: 971, ‘mykitchenrules’: 972, ‘mystrongstate’: 973, ‘nader_haq’: 974, ‘naga’: 975, ‘nainfidels’: 976, ‘naminglisting’: 977, ‘narîn’: 978, ‘nasty’: 979, ‘naturally’: 980, ‘nazi’: 981, ‘near’: 982, ‘necessarily’: 983, ‘need’: 984, ‘needarethinkinformat’: 985, ‘negated’: 986, ‘negotiate’: 987, ‘neilasaurus’: 988, ‘never’: 989, ‘new’: 990, ‘new_babylonia’: 991, ‘newscoverup’: 992, ‘next’: 993, ‘nice’: 994, ‘nigelbigmeech’: 995, ‘night’: 996, ‘nikki’: 997, ‘nobody’: 998, ‘noise’: 999, ‘nomcookiesnom’: 1000, ‘none’: 1001, ‘nooo’: 1002, ‘nope’: 1003, ‘notchrissmith’: 1004, ‘note’: 1005, ‘nothing’: 1006, ‘notsexist’: 1007, ‘novorossiyan’: 1008, ‘nscottg’: 1009, ‘number’: 1010, ‘number10gov’: 1011, ‘nytimes’: 1012, ‘obamacare’: 1013, ‘obamas’: 1014, ‘obsurfer84’: 1015, ‘obviously’: 1016, ‘occasion’: 1017, ‘offense’: 1018, ‘offensive’: 1019, ‘offering’: 1020, ‘oh’: 1021, ‘oil’: 1022, ‘oktar’: 1023, ‘old’: 1024, ‘oldgfatherclock’: 1025, ‘one’: 1026, ‘open’: 1027, ‘opener’: 1028, ‘opinion’: 1029, ‘opponent’: 1030, ‘opposed’: 1031, ‘optional’: 1032, ‘oreilly’: 1033, ‘org’: 1034, ‘origin’: 1035, ‘others’: 1036, ‘outside’: 1037, ‘overweight’: 1038, ‘owais00’: 1039, ‘p’: 1040, ‘p8952_’: 1041, ‘page’: 1042, ‘painful’: 1043, ‘pakistan’: 1044, ‘paknsave’: 1045, ‘palestine’: 1046, ‘pancake’: 1047, ‘paraketa’: 1048, ‘pardusxy’: 1049, ‘parent’: 1050, ‘paris’: 1051, ‘participate’: 1052, ‘passport’: 1053, ‘past’: 1054, ‘pastor’: 1055, ‘patrickosgood’: 1056, ‘pawarnhoff’: 1057, ‘pay’: 1058, ‘paying’: 1059, ‘pc’: 1060, ‘peace’: 1061, ‘peacenothate_’: 1062, ‘pedophile’: 1063, ‘pedophilia’: 1064, ‘peerworker’: 1065, ‘penalty’: 1066, ‘people’: 1067, ‘peopleschoice’: 1068, ‘perfect’: 1069, ‘period’: 1070, ‘perk’: 1071, ‘perl’: 1072, ‘personality’: 1073, ‘pervious’: 1074, ‘peymaneh123’: 1075, ‘phxken’: 1076, ‘pie’: 1077, ‘piece’: 1078, ‘pile’: 1079, ‘pilgars’: 1080, ‘pilot’: 1081, ‘pissing’: 1082, ‘pjnet’: 1083, ‘plane’: 1084, ‘planning’: 1085, ‘playing’: 1086, ‘playstations’: 1087, ‘please’: 1088, ‘pleasing’: 1089, ‘pnibbler’: 1090, ‘point’: 1091, ‘pole’: 1092, ‘police’: 1093, ‘political’: 1094, ‘politicalant’: 1095, ‘politics_pr’: 1096, ‘poor’: 1097, ‘poorly’: 1098, ‘population’: 1099, ‘portland’: 1100, ‘possible’: 1101, ‘possibly’: 1102, ‘post’: 1103, ‘posting’: 1104, ‘power’: 1105, ‘present’: 1106, ‘pressure’: 1107, ‘pretend’: 1108, ‘pretty’: 1109, ‘previous’: 1110, ‘price’: 1111, ‘prime’: 1112, ‘prisonersofwar’: 1113, ‘pro’: 1114, ‘probably’: 1115, ‘problem’: 1116, ‘problematic’: 1117, ‘producer’: 1118, ‘production’: 1119, ‘profile’: 1120, ‘project’: 1121, ‘promise’: 1122, ‘promo’: 1123, ‘promogirls’: 1124, ‘promoted’: 1125, ‘proof’: 1126, ‘propaganda’: 1127, ‘prophet’: 1128, ‘prospect’: 1129, ‘protecting’: 1130, ‘proudpatriot101’: 1131, ‘prove’: 1132, ‘provide’: 1133, ‘provision’: 1134, ‘psog’: 1135, ‘psogeco’: 1136, ‘psychbarakat’: 1137, ‘public’: 1138, ‘punch’: 1139, ‘purse’: 1140, ‘put’: 1141, ‘question’: 1142, ‘questionsformen’: 1143, ‘quietly’: 1144, ‘quit’: 1145, ‘quite’: 1146, ‘quran’: 1147, ‘r’: 1148, ‘race’: 1149, ‘racist’: 1150, ‘raised’: 1151, ‘random’: 1152, ‘randomhero30’: 1153, ‘raniakhalek’: 1154, ‘ransom’: 1155, ‘rape’: 1156, ‘raped’: 1157, ‘rapper’: 1158, ‘rapperguydmv’: 1159, ‘raqqa’: 1160, ‘raqqa_sl’: 1161, ‘rate’: 1162, ‘rather’: 1163, ‘ratio’: 1164, ‘ravenhuwolf’: 1165, ‘raw’: 1166, ‘rayyoosheh’: 1167, ‘react’: 1168, ‘read’: 1169, ‘readable’: 1170, ‘real’: 1171, ‘really’: 1172, ‘realryansipple’: 1173, ‘realtalk’: 1174, ‘reason’: 1175, ‘reasonably’: 1176, ‘rebel’: 1177, ‘recall’: 1178, ‘reckless’: 1179, ‘recognize’: 1180, ‘recommends’: 1181, ‘record’: 1182, ‘recruit’: 1183, ‘recuperate’: 1184, ‘redux’: 1185, ‘reevaluate’: 1186, ‘reference’: 1187, ‘referring’: 1188, ‘refine’: 1189, ‘regarding’: 1190, ‘regulation’: 1191, ‘rejected’: 1192, ‘relationship’: 1193, ‘release’: 1194, ‘religion’: 1195, ‘religious’: 1196, ‘relisha’: 1197, ‘reload’: 1198, ‘remember’: 1199, ‘reminded’: 1200, ‘rennie93’: 1201, ‘repeatedly’: 1202, ‘repetition’: 1203, ‘replacement’: 1204, ‘report’: 1205, ‘reputation’: 1206, ‘request’: 1207, ‘resorting’: 1208, ‘respond’: 1209, ‘restaurant’: 1210, ‘retreat’: 1211, ‘revolting’: 1212, ‘reza_rahman’: 1213, ‘rigged’: 1214, ‘right’: 1215, ‘rinehart33’: 1216, ‘rip’: 1217, ‘rjennromao’: 1218, ‘rkhayer’: 1219, ‘rkinglive2dance’: 1220, ‘rob’: 1221, ‘robbed’: 1222, ‘robert’: 1223, ‘robinriedstra’: 1224, ‘roll’: 1225, ‘roof’: 1226, ‘room’: 1227, ‘rooshv’: 1228, ‘rose’: 1229, ‘rotherham’: 1230, ‘rougek68′: 1231, ’round’: 1232, ‘routinely’: 1233, ‘rt’: 1234, ‘rts’: 1235, ‘rudawenglish’: 1236, ‘rudd’: 1237, ‘rude’: 1238, ‘rudoren’: 1239, ‘ruin’: 1240, ‘run’: 1241, ‘running’: 1242, ‘russian’: 1243, ‘said’: 1244, ‘saifullah666’: 1245, ‘sajid_fairooz’: 1246, ‘sake’: 1247, ‘salmon’: 1248, ‘salon’: 1249, ‘saltnburnem’: 1250, ‘salty’: 1251, ‘samkitsengupta’: 1252, ‘santa’: 1253, ‘sarah_jane666’: 1254, ‘sas’: 1255, ‘satire’: 1256, ‘saudi’: 1257, ‘sausage’: 1258, ‘save’: 1259, ‘saw’: 1260, ‘say’: 1261, ‘scared’: 1262, ‘schmeezi’: 1263, ‘school’: 1264, ‘score’: 1265, ‘scratch’: 1266, ‘screencaps’: 1267, ‘script’: 1268, ‘scripted’: 1269, ‘scroll’: 1270, ‘scum’: 1271, ‘season’: 1272, ‘see’: 1273, ‘seen’: 1274, ‘segment’: 1275, ‘self’: 1276, ‘selfies’: 1277, ‘selling’: 1278, ‘sellout’: 1279, ‘semite’: 1280, ‘semzyxx’: 1281, ‘sensitive’: 1282, ‘sent’: 1283, ‘serious’: 1284, ‘seriously’: 1285, ‘serlasco’: 1286, ‘serve’: 1287, ‘served’: 1288, ‘service’: 1289, ‘serving’: 1290, ‘set’: 1291, ‘setup’: 1292, ‘sevilzadeh’: 1293, ‘sex’: 1294, ‘sexhonest’: 1295, ‘sexism’: 1296, ‘sexist’: 1297, ‘sexually’: 1298, ‘shami_is_back’: 1299, ‘shaz’: 1300, ‘shell’: 1301, ‘shermertron’: 1302, ‘sherri’: 1303, ‘shia’: 1304, ‘shingal’: 1305, ‘shirt’: 1306, ‘shit’: 1307, ‘shoe0nhead’: 1308, ‘short’: 1309, ‘shovel’: 1310, ‘show’: 1311, ‘shower’: 1312, ‘shred’: 1313, ‘shut’: 1314, ‘sick’: 1315, ‘side’: 1316, ‘sighhhh’: 1317, ‘simpson’: 1318, ‘since’: 1319, ‘singer’: 1320, ‘sinjar’: 1321, ‘sirgoldenrod’: 1322, ‘six’: 1323, ‘skank’: 1324, ‘slagkick’: 1325, ‘slap’: 1326, ‘slave’: 1327, ‘slaved’: 1328, ‘sleep’: 1329, ‘sleeping’: 1330, ‘slide’: 1331, ‘slightly’: 1332, ‘sloshedtrain2’: 1333, ‘slow’: 1334, ‘smack’: 1335, ‘smackem’: 1336, ‘small’: 1337, ‘smarter’: 1338, ‘smash’: 1339, ‘sold’: 1340, ‘soldier’: 1341, ‘someone’: 1342, ‘something’: 1343, ‘sometimes’: 1344, ‘somewhat’: 1345, ‘soon’: 1346, ‘sorbent’: 1347, ‘sorbet’: 1348, ‘sorry’: 1349, ‘sorrynotsorry’: 1350, ‘sound’: 1351, ‘source’: 1352, ‘space’: 1353, ‘spacekatgal’: 1354, ‘spacequeentbh’: 1355, ‘spam’: 1356, ‘spatchcock’: 1357, ‘speak’: 1358, ‘speaking’: 1359, ‘speech’: 1360, ‘spiritual’: 1361, ‘spoiled’: 1362, ‘sport’: 1363, ‘sports2inflatio’: 1364, ‘sputnik’: 1365, ‘srhbutts’: 1366, ‘stalin’: 1367, ‘stand’: 1368, ‘standard’: 1369, ‘standing’: 1370, ‘starius’: 1371, ‘started’: 1372, ‘starting’: 1373, ‘state’: 1374, ‘statistic’: 1375, ‘stats’: 1376, ‘stay’: 1377, ‘stayed’: 1378, ‘staying’: 1379, ‘step’: 1380, ‘steve’: 1381, ‘stiff’: 1382, ‘still’: 1383, ‘stood’: 1384, ‘stop’: 1385, ‘stopping’: 1386, ‘stopwadhwa2015’: 1387, ‘story’: 1388, ‘strategically’: 1389, ‘streaming’: 1390, ‘stretch’: 1391, ‘strike’: 1392, ‘strong’: 1393, ‘struggle’: 1394, ‘student’: 1395, ‘stuff’: 1396, ‘stupid’: 1397, ‘stylist’: 1398, ‘subject’: 1399, ‘subtle’: 1400, ‘success’: 1401, ‘suck’: 1402, ‘sudixitca’: 1403, ‘suicide’: 1404, ‘sumersloan’: 1405, ‘super’: 1406, ‘superior’: 1407, ‘support’: 1408, ‘supported’: 1409, ‘sure’: 1410, ‘surgery’: 1411, ‘swallow’: 1412, ‘swiftonsecurity’: 1413, ‘switching’: 1414, ‘syazlicious’: 1415, ‘syria’: 1416, ‘systemic’: 1417, ‘tacky’: 1418, ‘taken’: 1419, ‘taking’: 1420, ‘tal’: 1421, ‘taliban’: 1422, ‘talk’: 1423, ‘talladega’: 1424, ‘taqiyya’: 1425, ‘tarah’: 1426, ‘tart’: 1427, ‘tasteless’: 1428, ‘tatibresolin’: 1429, ‘tbh’: 1430, ‘tbielawa’: 1431, ‘tcot’: 1432, ‘teach’: 1433, ‘teaching’: 1434, ‘team’: 1435, ‘tell’: 1436, ‘telling’: 1437, ‘tempting’: 1438, ‘terrible’: 1439, ‘terror’: 1440, ‘terrorism’: 1441, ‘terrorist’: 1442, ‘testicle’: 1443, ‘texasarlington’: 1444, ‘thanks’: 1445, ‘thatll’: 1446, ‘thats’: 1447, ‘theckman’: 1448, ‘thedoubleclicks’: 1449, ‘thegeek_chick’: 1450, ‘thegoodguysau’: 1451, ‘thelindsayellis’: 1452, ‘thelmasleaze’: 1453, ‘themirai’: 1454, ‘themselvespffft’: 1455, ‘themuslimguy’: 1456, ‘thequinnspiracy’: 1457, ‘there’: 1458, ‘theyre’: 1459, ‘thing’: 1460, ‘think’: 1461, ‘thinking’: 1462, ‘third’: 1463, ‘thought’: 1464, ‘threw’: 1465, ‘throw’: 1466, ‘tied’: 1467, ‘tim’: 1468, ‘time’: 1469, ‘timespan’: 1470, ‘tiny’: 1471, ‘tip’: 1472, ‘tnr’: 1473, ‘tobyrobertbull’: 1474, ‘today’: 1475, ‘todayreal’: 1476, ‘told’: 1477, ‘tolerate’: 1478, ‘tomato’: 1479, ‘tonight’: 1480, ‘toodles’: 1481, ‘tool’: 1482, ‘top’: 1483, ‘total’: 1484, ‘train’: 1485, ‘transic_nyc’: 1486, ‘translator’: 1487, ‘treating’: 1488, ‘trend’: 1489, ‘tried’: 1490, ‘tripple’: 1491, ‘trolley’: 1492, ‘troop’: 1493, ‘trophy’: 1494, ‘truaemusic’: 1495, ‘truly’: 1496, ‘try’: 1497, ‘trying’: 1498, ‘turf’: 1499, ‘turk’: 1500, ‘tv’: 1501, ‘tw’: 1502, ‘tweet’: 1503, ‘twist’: 1504, ‘twista202’: 1505, ‘twitter’: 1506, ‘two’: 1507, ‘typed’: 1508, ‘typically’: 1509, ‘typo’: 1510, ‘u’: 1511, ‘ugly’: 1512, ‘ukraine’: 1513, ‘ukrainian’: 1514, ‘ultrafundamentalist’: 1515, ‘unacceptable’: 1516, ‘unapologetic’: 1517, ‘unashamed’: 1518, ‘uncalled’: 1519, ‘understand’: 1520, ‘understands’: 1521, ‘unfair’: 1522, ‘uninvolved’: 1523, ‘university’: 1524, ‘update’: 1525, ‘uplay’: 1526, ‘upon’: 1527, ‘use’: 1528, ‘user’: 1529, ‘username’: 1530, ‘usually’: 1531, ‘valenti’: 1532, ‘value’: 1533, ‘vandaliser’: 1534, ‘vc’: 1535, ‘vcs’: 1536, ‘venereveritas13’: 1537, ‘venomous9’: 1538, ‘versa’: 1539, ‘verse’: 1540, ‘vex0rian’: 1541, ‘via’: 1542, ‘vice’: 1543, ‘victim’: 1544, ‘victorymonk’: 1545, ‘video’: 1546, ‘videobeautiful’: 1547, ‘violence’: 1548, ‘voice’: 1549, ‘vonta624’: 1550, ‘vote’: 1551, ‘voted’: 1552, ‘w’: 1553, ‘wadhwa’: 1554, ‘wait’: 1555, ‘waiting’: 1556, ‘wakeuplibsgtjoenbc’: 1557, ‘walk’: 1558, ‘wan’: 1559, ‘wanted’: 1560, ‘warriorsialkot’: 1561, ‘washed’: 1562, ‘washingtonpost’: 1563, ‘wasnt’: 1564, ‘watan71969’: 1565, ‘watch’: 1566, ‘watched’: 1567, ‘watching’: 1568, ‘way’: 1569, ‘week’: 1570, ‘weekly’: 1571, ‘well’: 1572, ‘went’: 1573, ‘werent’: 1574, ‘west’: 1575, ‘western’: 1576, ‘wetsprocket’: 1577, ‘wheat’: 1578, ‘wheel’: 1579, ‘whereisyourdignity’: 1580, ‘whether’: 1581, ‘whiny’: 1582, ‘white’: 1583, ‘whiteblack’: 1584, ‘whitening’: 1585, ‘whole’: 1586, ‘wi’: 1587, ‘wife’: 1588, ‘win’: 1589, ‘wing’: 1590, ‘wish’: 1591, ‘witch_sniffer’: 1592, ‘without’: 1593, ‘witty’: 1594, ‘wizardryofozil’: 1595, ‘wks’: 1596, ‘wnba’: 1597, ‘wnyc’: 1598, ‘wocracial’: 1599, ‘woman’: 1600, ‘womenagainstfeminism’: 1601, ‘womeninterpret’: 1602, ‘word’: 1603, ‘work’: 1604, ‘worker’: 1605, ‘working’: 1606, ‘worse’: 1607, ‘worst’: 1608, ‘would’: 1609, ‘wouldnt’: 1610, ‘wouldve’: 1611, ‘wow’: 1612, ‘wrecking’: 1613, ‘write’: 1614, ‘writer’: 1615, ‘writing’: 1616, ‘wrong’: 1617, ‘xmjee’: 1618, ‘yall’: 1619, ‘yawn’: 1620, ‘yeah’: 1621, ‘year’: 1622, ‘yes’: 1623, ‘yesallwomen’: 1624, ‘yesyouresexist’: 1625, ‘yet’: 1626, ‘yield’: 1627, ‘youd’: 1628, ‘youll’: 1629, ‘young’: 1630, ‘youre’: 1631, ‘yousufpoosuf’: 1632, ‘youtube’: 1633, ‘ypg’: 1634, ‘yum’: 1635, ‘zaibatsunews’: 1636, ‘zene55’: 1637, ‘zero’: 1638, ‘zython86’: 1639}
Padding
In [ ]:
# The sequence length is pre-defined, you can’t change this value for this exercise
seq_length = 16
#Please Complete this function
def encode_and_add_padding(sentences, seq_length, word_index):
sent_encoded = []
for sent in sentences:
temp_encoded = [word_index[word] if word in word_index else word_index[‘[UNKOWN]’] for word in sent]
if len(temp_encoded) < seq_length:
temp_encoded += [word_index['[PAD]']] * (seq_length - len(temp_encoded))
else:
temp_encoded = temp_encoded[:seq_length]
sent_encoded.append(temp_encoded)
return sent_encoded
train_pad_encoded = encode_and_add_padding(text_train_le, seq_length, word_index )
test_pad_encoded = encode_and_add_padding(text_test_le, seq_length, word_index )
Embedding lookup table
In [ ]:
import gensim.downloader as api
word_emb_model = api.load("glove-twitter-25")
In [ ]:
# Get the Embedding lookup table
import numpy as np
emb_dim = word_emb_model.vector_size
emb_table = []
for i, word in enumerate(word_list):
if word in word_emb_model:
emb_table.append(word_emb_model[word])
else:
emb_table.append([0]*emb_dim)
emb_table = np.array(emb_table)
Model
In [ ]:
vocab_size = len(word_list)
n_hidden = 50
total_epoch = 100
learning_rate = 0.01
In [ ]:
import torch
#You can enable GPU here (cuda); or just CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from sklearn.metrics import accuracy_score
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.emb = nn.Embedding(vocab_size, emb_dim)
self.emb.weight.data.copy_(torch.from_numpy(emb_table))
self.emb.weight.requires_grad = False
self.lstm = nn.LSTM(emb_dim, n_hidden, num_layers=2, batch_first =True, dropout=0.2)
self.linear = nn.Linear(n_hidden,n_class)
def forward(self, x):
x = self.emb(x)
x,_ = self.lstm(x)
x = self.linear(x[:,-1,:])
return x
model = Model().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
input_torch = torch.from_numpy(np.array(train_pad_encoded)).to(device)
target_torch = torch.from_numpy(np.array(label_train_encoded)).view(-1).to(device)
for epoch in range(total_epoch):
model.train()
optimizer.zero_grad()
outputs = model(input_torch)
loss = criterion(outputs, target_torch)
loss.backward()
optimizer.step()
if epoch%10 == 9:
predicted = torch.argmax(outputs, -1)
acc= accuracy_score(predicted.cpu().numpy(),target_torch.cpu().numpy())
print('Epoch: %d, loss: %.5f, train_acc: %.2f' %(epoch + 1, loss.item(), acc))
print('Finished Training')
Epoch: 10, loss: 0.76369, train_acc: 0.63
Epoch: 20, loss: 0.62751, train_acc: 0.70
Epoch: 30, loss: 0.58569, train_acc: 0.66
Epoch: 40, loss: 0.41178, train_acc: 0.84
Epoch: 50, loss: 0.33358, train_acc: 0.88
Epoch: 60, loss: 0.23153, train_acc: 0.91
Epoch: 70, loss: 0.14260, train_acc: 0.93
Epoch: 80, loss: 0.26337, train_acc: 0.89
Epoch: 90, loss: 0.15540, train_acc: 0.95
Epoch: 100, loss: 0.06726, train_acc: 0.97
Finished Training
Save and Load the model
Save the model
In [ ]:
torch.save(model,'lab5.pt')
Load the model
In [ ]:
model2 = torch.load('lab5.pt')
model2.eval()
Out[ ]:
Model(
(emb): Embedding(1640, 25)
(lstm): LSTM(25, 50, num_layers=2, batch_first=True, dropout=0.2)
(linear): Linear(in_features=50, out_features=3, bias=True)
)
Testing
In [ ]:
input_torch = torch.from_numpy(np.array(test_pad_encoded)).to(device)
outputs = model2(input_torch)
predicted = torch.argmax(outputs, -1)
from sklearn.metrics import classification_report
print(classification_report(label_test_encoded,predicted.cpu().numpy()))
precision recall f1-score support
0 0.77 0.77 0.77 65
1 0.83 0.50 0.62 10
2 0.48 0.56 0.52 25
accuracy 0.69 100
macro avg 0.70 0.61 0.64 100
weighted avg 0.70 0.69 0.69 100
In [ ]: