12-topic-model
Topic Modeling with LDA¶
In this notebook, we will train a Latent Dirichlet Allocation (LDA) model on the NLTK sample of the Reuters Corpus (10,788 news documents totaling 1.3 million words). Then we will use the topics inferred by the LDA model as features to approach the document classification task on the same dataset.
We will use the gensim implementation of LDA, so let’s start with installation.
In [4]:
!pip install gensim
Requirement already satisfied: gensim in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (3.8.3)
Requirement already satisfied: smart-open>=1.8.1 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from gensim) (2.0.0)
Requirement already satisfied: six>=1.5.0 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from gensim) (1.14.0)
Requirement already satisfied: scipy>=0.18.1 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from gensim) (1.3.1)
Requirement already satisfied: numpy>=1.11.3 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from gensim) (1.18.1)
Requirement already satisfied: boto3 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim) (1.13.16)
Requirement already satisfied: requests in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim) (2.22.0)
Requirement already satisfied: boto in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim) (2.49.0)
Requirement already satisfied: botocore<1.17.0,>=1.16.16 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from boto3->smart-open>=1.8.1->gensim) (1.16.16)
Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from boto3->smart-open>=1.8.1->gensim) (0.3.3)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from boto3->smart-open>=1.8.1->gensim) (0.10.0)
Requirement already satisfied: idna<2.9,>=2.5 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (2.8)
Requirement already satisfied: certifi>=2017.4.17 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (2020.4.5.1)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (1.25.8)
Requirement already satisfied: docutils<0.16,>=0.10 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from botocore<1.17.0,>=1.16.16->boto3->smart-open>=1.8.1->gensim) (0.15.2)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /Users/jason/anaconda3/envs/torch/lib/python3.7/site-packages (from botocore<1.17.0,>=1.16.16->boto3->smart-open>=1.8.1->gensim) (2.8.1)
In [5]:
import nltk
nltk.download(“reuters”) # if necessary
from nltk.corpus import reuters
[nltk_data] Downloading package reuters to /Users/jason/nltk_data…
[nltk_data] Package reuters is already up-to-date!
Documents in the training set will have the word train in its file ID, we will use this to split the corpus into training and test set.
Same as notebook 03-classification, we will build a classifier to distinguish the most common topic in the corpus, “acq” (acqusitions). So we also record the topic label of each documents.
In [6]:
training_set = []
training_classifications = []
test_set = []
test_classifications = []
topic = “acq”
for file_id in reuters.fileids():
if file_id.startswith(“train”):
training_set.append(reuters.words(file_id))
if topic in reuters.categories(file_id):
training_classifications.append(topic)
else:
training_classifications.append(“not ” + topic)
else:
test_set.append(reuters.words(file_id))
if topic in reuters.categories(file_id):
test_classifications.append(topic)
else:
test_classifications.append(“not ” + topic)
print(“Train Size:”, len(training_set))
print(“Test Size:”, len(test_set))
Train Size: 7769
Test Size: 3019
Now, let’s do pre-processing for the documents in our Reuter Corpus. As we discussed in the lecture, good pre-processing practice is crucial for topic modelling. Hence we will use the following pre-processing steps:
Lowercase all words.
Removing stopwords.
Removing words shorter than 4 characters.
Removing word with too high/low frequency.
In [8]:
import logging
import operator
import gensim
from gensim import corpora
import nltk
# for gensim to output some progress information while it’s training
logging.basicConfig(format=’%(asctime)s : %(levelname)s : %(message)s’, level=logging.INFO)
en_stop = set(nltk.corpus.stopwords.words(‘english’))
def preprocessing(dataset):
# we filter stopwords using nltk stopword list
text_data = [[word.lower() for word in doc if (len(word)>4 and word.lower() not in en_stop) ] for doc in dataset]
# Dictionary encapsulates the mapping between normalized words and their integer ids.
dictionary = corpora.Dictionary(text_data)
# no_below: Keep tokens which are contained in at least no_below documents.
# no_above: Keep tokens which are contained in no more than no_above documents
# (fraction of total corpus size, not an absolute number).
dictionary.filter_extremes(no_below=10, no_above=0.5)
# Filter out the 20 most frequent tokens that appear in the documents.
dictionary.filter_n_most_frequent(20)
# convert documents to BOW representations
corpus = [dictionary.doc2bow(doc) for doc in text_data]
return corpus, dictionary
preprocessed_train, train_dictionary = preprocessing(training_set)
preprocessed_test, test_dictionary = preprocessing(test_set)
2020-05-26 20:16:35,618 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-05-26 20:16:36,223 : INFO : built Dictionary(19497 unique tokens: [‘bukhoosh’, ‘picks’, ‘latorre’, ‘stainless’, ‘shing’]…) from 7769 documents (total 412952 corpus positions)
2020-05-26 20:16:36,259 : INFO : discarding 15633 tokens: [(‘alleviating’, 1), (‘arrivals’, 5), (‘arroba’, 1), (‘bahia’, 4), (‘carnival’, 4), (‘comissaria’, 1), (‘consignment’, 3), (‘covertible’, 1), (‘cruzados’, 4), (‘dificulties’, 1)]…
2020-05-26 20:16:36,259 : INFO : keeping 3864 tokens which were in no less than 10 and no more than 3884 (=50.0%) documents
2020-05-26 20:16:36,275 : INFO : resulting dictionary: Dictionary(3864 unique tokens: [‘buoyant’, ‘highly’, ‘shipping’, ‘utility’, ‘fresh’]…)
2020-05-26 20:16:36,284 : INFO : discarding 20 tokens: [(‘company’, 1836), (‘would’, 1553), (‘share’, 1253), (‘billion’, 1185), (‘march’, 1170), (‘april’, 1126), (‘market’, 1055), (‘stock’, 985), (‘three’, 966), (‘record’, 907)]…
2020-05-26 20:16:36,293 : INFO : resulting dictionary: Dictionary(3844 unique tokens: [‘significance’, ‘buoyant’, ‘highly’, ‘shipping’, ‘utility’]…)
2020-05-26 20:16:37,884 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-05-26 20:16:38,102 : INFO : built Dictionary(12398 unique tokens: [‘alabama’, ‘anodes’, ‘picks’, ‘197agreement’, ‘indentifying’]…) from 3019 documents (total 146126 corpus positions)
2020-05-26 20:16:38,121 : INFO : discarding 10448 tokens: [(‘avowed’, 1), (‘awaiting’, 9), (‘aware’, 9), (‘button’, 1), (‘canberra’, 1), (‘capel’, 3), (‘capitals’, 3), (‘centred’, 1), (‘conflict’, 9), (‘correspondents’, 3)]…
2020-05-26 20:16:38,122 : INFO : keeping 1950 tokens which were in no less than 10 and no more than 1509 (=50.0%) documents
2020-05-26 20:16:38,132 : INFO : resulting dictionary: Dictionary(1950 unique tokens: [‘fresh’, ‘referring’, ‘smith’, ‘operator’, ‘hudson’]…)
2020-05-26 20:16:38,136 : INFO : discarding 20 tokens: [(‘company’, 707), (‘would’, 567), (‘billion’, 509), (‘share’, 496), (‘stock’, 403), (‘market’, 388), (‘sales’, 385), (‘group’, 360), (‘april’, 349), (‘shares’, 337)]…
2020-05-26 20:16:38,144 : INFO : resulting dictionary: Dictionary(1930 unique tokens: [‘fresh’, ‘referring’, ‘shipping’, ‘operator’, ‘hudson’]…)
Now we train our LDA model with the preprocessed training corpus. LdaModel is the implementation of LDA in gensim. Here we train the model for 10 passes with 50 topics. Then, we print out the words associated with each topic.
In [36]:
num_topics = 50
#alpha = document-topic prior
#eta (beta in lecture) = topic-word prior
model = gensim.models.LdaModel(preprocessed_train, id2word=train_dictionary,
num_topics=num_topics, alpha=’auto’, eta=’auto’,
passes=10)
for topic_id in range(model.num_topics):
# extract 10 top words for each topic
topk = model.show_topic(topic_id, 10)
topk_words = [ w for w, _ in topk ]
print(‘{}: {}’.format(topic_id, ‘ ‘.join(topk_words)))
2020-05-26 20:42:49,921 : INFO : using autotuned alpha, starting with [0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02]
2020-05-26 20:42:49,927 : INFO : using serial LDA version on this node
2020-05-26 20:42:49,962 : INFO : running online (multi-pass) LDA training, 50 topics, 10 passes over the supplied corpus of 7769 documents, updating model once every 2000 documents, evaluating perplexity every 7769 documents, iterating 50x with a convergence threshold of 0.001000
2020-05-26 20:42:49,964 : INFO : PROGRESS: pass 0, at document #2000/7769
2020-05-26 20:42:51,476 : INFO : optimized alpha [0.019572044, 0.019463345, 0.01937523, 0.01955752, 0.019303704, 0.019628156, 0.019287987, 0.019109435, 0.01939813, 0.019261688, 0.019756382, 0.01948754, 0.019298492, 0.019916909, 0.0190866, 0.01952815, 0.019255694, 0.019568553, 0.019908423, 0.01937661, 0.019456923, 0.019310089, 0.019248467, 0.019248428, 0.019357467, 0.019337641, 0.019061085, 0.019990833, 0.019416135, 0.019562993, 0.019478545, 0.019478628, 0.019147275, 0.020068992, 0.020296056, 0.01915369, 0.019316217, 0.019394618, 0.019673835, 0.019590974, 0.019183625, 0.019436333, 0.019346602, 0.01961138, 0.019247333, 0.01913221, 0.019150427, 0.019307362, 0.019806938, 0.019417368]
2020-05-26 20:42:51,488 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:42:51,501 : INFO : topic #7 (0.019): 0.012*”money” + 0.011*”northern” + 0.009*”prices” + 0.007*”valley” + 0.007*”operations” + 0.007*”supply” + 0.006*”industries” + 0.006*”shareholders” + 0.006*”major” + 0.006*”falls”
2020-05-26 20:42:51,503 : INFO : topic #26 (0.019): 0.015*”banks” + 0.010*”interest” + 0.010*”includes” + 0.008*”canadian” + 0.008*”excludes” + 0.008*”dealers” + 0.007*”controls” + 0.007*”taxes” + 0.007*”argentine” + 0.007*”argentina”
2020-05-26 20:42:51,506 : INFO : topic #13 (0.020): 0.020*”january” + 0.017*”production” + 0.015*”february” + 0.014*”december” + 0.011*”output” + 0.010*”prices” + 0.008*”ounces” + 0.007*”compared” + 0.007*”money” + 0.006*”growth”
2020-05-26 20:42:51,510 : INFO : topic #33 (0.020): 0.024*”february” + 0.016*”january” + 0.014*”prices” + 0.010*”total” + 0.009*”month” + 0.008*”production” + 0.008*”index” + 0.007*”tonnes” + 0.006*”energy” + 0.006*”stake”
2020-05-26 20:42:51,514 : INFO : topic #34 (0.020): 0.011*”japan” + 0.008*”dollar” + 0.008*”prices” + 0.007*”japanese” + 0.007*”coffee” + 0.006*”report” + 0.006*”current” + 0.006*”january” + 0.005*”february” + 0.005*”quarter”
2020-05-26 20:42:51,515 : INFO : topic diff=26.794415, rho=1.000000
2020-05-26 20:42:51,535 : INFO : PROGRESS: pass 0, at document #4000/7769
2020-05-26 20:42:52,905 : INFO : optimized alpha [0.01973904, 0.019573212, 0.019722603, 0.01968059, 0.019198775, 0.01913129, 0.01985764, 0.019365422, 0.019740596, 0.019234028, 0.020038161, 0.019911094, 0.019034568, 0.02039023, 0.019034006, 0.019734593, 0.019437656, 0.020069115, 0.02047127, 0.019410687, 0.019251486, 0.019031467, 0.019530112, 0.019999003, 0.019459171, 0.019471845, 0.01921056, 0.02058163, 0.01920136, 0.020653373, 0.019468281, 0.019371666, 0.019150682, 0.020333257, 0.020783171, 0.019202802, 0.019325808, 0.019397981, 0.02015914, 0.019745838, 0.019290054, 0.01941761, 0.019545661, 0.019843519, 0.019615598, 0.019033924, 0.019371321, 0.019196603, 0.019957576, 0.019416263]
2020-05-26 20:42:52,916 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:42:52,931 : INFO : topic #14 (0.019): 0.027*”fiscal” + 0.026*”quarter” + 0.016*”includes” + 0.015*”eight” + 0.015*”south” + 0.012*”products” + 0.011*”given” + 0.011*”assistance” + 0.010*”costs” + 0.010*”african”
2020-05-26 20:42:52,932 : INFO : topic #45 (0.019): 0.029*”contract” + 0.018*”harper” + 0.014*”trading” + 0.011*”option” + 0.011*”contracts” + 0.009*”options” + 0.008*”traders” + 0.008*”business” + 0.008*”metal” + 0.008*”merger”
2020-05-26 20:42:52,934 : INFO : topic #27 (0.021): 0.011*”foreign” + 0.011*”government” + 0.007*”guilders” + 0.006*”economic” + 0.006*”interest” + 0.006*”exchange” + 0.006*”japan” + 0.006*”business” + 0.005*”japanese” + 0.005*”dollar”
2020-05-26 20:42:52,935 : INFO : topic #29 (0.021): 0.016*”credit” + 0.016*”offer” + 0.015*”canadian” + 0.014*”rates” + 0.011*”includes” + 0.011*”grain” + 0.009*”stake” + 0.009*”extraordinary” + 0.008*”interest” + 0.008*”owned”
2020-05-26 20:42:52,936 : INFO : topic #34 (0.021): 0.013*”chrysler” + 0.012*”prices” + 0.008*”report” + 0.008*”exports” + 0.008*”surplus” + 0.007*”world” + 0.007*”official” + 0.007*”countries” + 0.007*”taiwan” + 0.007*”japan”
2020-05-26 20:42:52,941 : INFO : topic diff=1.786495, rho=0.707107
2020-05-26 20:42:52,958 : INFO : PROGRESS: pass 0, at document #6000/7769
2020-05-26 20:42:54,022 : INFO : optimized alpha [0.020165538, 0.019678136, 0.020018695, 0.020066703, 0.019200832, 0.01882104, 0.020429116, 0.019591343, 0.01984982, 0.019216204, 0.020341935, 0.020141615, 0.019107211, 0.02118971, 0.019681646, 0.020048, 0.019761913, 0.020896018, 0.020864861, 0.01961101, 0.019346569, 0.018960305, 0.019966658, 0.02133105, 0.019587057, 0.019626427, 0.019699022, 0.021352746, 0.019117078, 0.021581803, 0.019711465, 0.019461727, 0.019247256, 0.02060128, 0.021290636, 0.019500319, 0.019413028, 0.01942517, 0.020962749, 0.019874312, 0.019490728, 0.01960203, 0.019657457, 0.020378035, 0.019785281, 0.018962076, 0.01953031, 0.019127965, 0.020257792, 0.019504951]
2020-05-26 20:42:54,031 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:42:54,043 : INFO : topic #5 (0.019): 0.020*”cyclops” + 0.019*”offer” + 0.018*”cyacq” + 0.017*”dixons” + 0.012*”tender” + 0.011*”world” + 0.011*”cents” + 0.008*”sugar” + 0.008*”audio” + 0.007*”mills”
2020-05-26 20:42:54,044 : INFO : topic #21 (0.019): 0.019*”rates” + 0.016*”money” + 0.016*”bundesbank” + 0.015*”growth” + 0.014*”policy” + 0.013*”interest” + 0.013*”german” + 0.011*”reserve” + 0.011*”supply” + 0.010*”monetary”
2020-05-26 20:42:54,046 : INFO : topic #27 (0.021): 0.017*”government” + 0.014*”foreign” + 0.007*”strike” + 0.006*”interest” + 0.006*”economic” + 0.006*”business” + 0.006*”added” + 0.005*”payments” + 0.005*”growth” + 0.005*”banks”
2020-05-26 20:42:54,048 : INFO : topic #23 (0.021): 0.090*”dividend” + 0.067*”prior” + 0.035*”payout” + 0.031*”payable” + 0.029*”quarterly” + 0.022*”francs” + 0.020*”holders” + 0.019*”shareholders” + 0.018*”common” + 0.017*”regular”
2020-05-26 20:42:54,052 : INFO : topic #29 (0.022): 0.022*”credit” + 0.016*”offer” + 0.014*”includes” + 0.013*”canadian” + 0.013*”copper” + 0.010*”extraordinary” + 0.010*”pacific” + 0.010*”pretax” + 0.010*”stake” + 0.010*”rates”
2020-05-26 20:42:54,053 : INFO : topic diff=1.731740, rho=0.577350
2020-05-26 20:42:55,364 : INFO : -8.274 per-word bound, 309.5 perplexity estimate based on a held-out corpus of 1769 documents with 75293 words
2020-05-26 20:42:55,364 : INFO : PROGRESS: pass 0, at document #7769/7769
2020-05-26 20:42:56,290 : INFO : optimized alpha [0.020586696, 0.019684788, 0.020317422, 0.020885272, 0.019409122, 0.018728845, 0.021106273, 0.019858371, 0.020013927, 0.01955776, 0.020892449, 0.02037052, 0.019162318, 0.021822851, 0.020273779, 0.020548958, 0.020018987, 0.02175123, 0.0211744, 0.019687578, 0.019587826, 0.019155385, 0.020619992, 0.022404952, 0.019703876, 0.019985767, 0.02013933, 0.022437084, 0.019244788, 0.022390774, 0.020012153, 0.019814348, 0.01947357, 0.020818831, 0.022089422, 0.02006217, 0.019877441, 0.01956856, 0.021916525, 0.019943904, 0.01988523, 0.01995826, 0.019717027, 0.02082717, 0.019940017, 0.019028034, 0.019550372, 0.019181795, 0.02044905, 0.019882392]
2020-05-26 20:42:56,299 : INFO : merging changes from 1769 documents into a model of 7769 documents
2020-05-26 20:42:56,313 : INFO : topic #5 (0.019): 0.077*”cyclops” + 0.062*”dixons” + 0.044*”offer” + 0.034*”cyacq” + 0.023*”citicorp” + 0.022*”tender” + 0.013*”conflict” + 0.013*”audio” + 0.012*”round” + 0.011*”cents”
2020-05-26 20:42:56,314 : INFO : topic #45 (0.019): 0.036*”contract” + 0.029*”trading” + 0.024*”futures” + 0.020*”harper” + 0.017*”contracts” + 0.017*”exchange” + 0.014*”option” + 0.014*”options” + 0.013*”cross” + 0.011*”harcourt”
2020-05-26 20:42:56,315 : INFO : topic #23 (0.022): 0.103*”dividend” + 0.079*”prior” + 0.039*”payout” + 0.039*”quarterly” + 0.035*”payable” + 0.025*”preferred” + 0.023*”common” + 0.023*”shareholders” + 0.020*”regular” + 0.019*”holders”
2020-05-26 20:42:56,318 : INFO : topic #29 (0.022): 0.027*”credit” + 0.015*”offer” + 0.014*”stake” + 0.014*”includes” + 0.014*”pacific” + 0.012*”extraordinary” + 0.011*”holding” + 0.011*”canadian” + 0.010*”pretax” + 0.010*”resources”
2020-05-26 20:42:56,326 : INFO : topic #27 (0.022): 0.019*”government” + 0.012*”foreign” + 0.007*”economic” + 0.007*”strike” + 0.006*”business” + 0.005*”added” + 0.005*”interest” + 0.005*”chairman” + 0.005*”companies” + 0.005*”payments”
2020-05-26 20:42:56,328 : INFO : topic diff=1.662563, rho=0.500000
2020-05-26 20:42:56,347 : INFO : PROGRESS: pass 1, at document #2000/7769
2020-05-26 20:42:57,351 : INFO : optimized alpha [0.020836003, 0.019783096, 0.020296566, 0.020989604, 0.019513465, 0.018498288, 0.021532979, 0.019768348, 0.019947652, 0.019522868, 0.021167329, 0.020500224, 0.018955518, 0.02209281, 0.020538243, 0.020507265, 0.020135825, 0.022323139, 0.021085808, 0.019640096, 0.019764462, 0.019216148, 0.02062, 0.022782702, 0.019735323, 0.020103555, 0.020365292, 0.022599857, 0.019113176, 0.022704799, 0.01991202, 0.01977459, 0.019472957, 0.02085296, 0.022159688, 0.020260058, 0.019930324, 0.019411867, 0.022124168, 0.01989837, 0.020071924, 0.019879548, 0.019633684, 0.020860827, 0.019972615, 0.01891552, 0.019559942, 0.019246591, 0.020363597, 0.019851647]
2020-05-26 20:42:57,362 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:42:57,377 : INFO : topic #5 (0.018): 0.066*”cyclops” + 0.056*”offer” + 0.055*”dixons” + 0.029*”tender” + 0.028*”cyacq” + 0.019*”pioneer” + 0.018*”citicorp” + 0.015*”video” + 0.012*”conflict” + 0.012*”audio”
2020-05-26 20:42:57,379 : INFO : topic #12 (0.019): 0.022*”minister” + 0.016*”sector” + 0.015*”employers” + 0.015*”great” + 0.014*”official” + 0.013*”ceiling” + 0.010*”venezuela” + 0.010*”chirac” + 0.010*”strikes” + 0.009*”rotterdam”
2020-05-26 20:42:57,380 : INFO : topic #27 (0.023): 0.017*”government” + 0.011*”foreign” + 0.007*”economic” + 0.007*”business” + 0.006*”added” + 0.005*”interest” + 0.005*”chairman” + 0.005*”loans” + 0.005*”guilders” + 0.005*”could”
2020-05-26 20:42:57,383 : INFO : topic #29 (0.023): 0.024*”credit” + 0.015*”offer” + 0.015*”stake” + 0.014*”pacific” + 0.013*”includes” + 0.012*”canadian” + 0.012*”holding” + 0.011*”owned” + 0.011*”resources” + 0.010*”holdings”
2020-05-26 20:42:57,385 : INFO : topic #23 (0.023): 0.116*”dividend” + 0.076*”prior” + 0.038*”payout” + 0.036*”quarterly” + 0.035*”payable” + 0.026*”common” + 0.026*”preferred” + 0.026*”shareholders” + 0.021*”regular” + 0.019*”holders”
2020-05-26 20:42:57,395 : INFO : topic diff=1.363421, rho=0.412235
2020-05-26 20:42:57,409 : INFO : PROGRESS: pass 1, at document #4000/7769
2020-05-26 20:42:58,422 : INFO : optimized alpha [0.021118553, 0.019987715, 0.020553831, 0.021348057, 0.019756032, 0.018308712, 0.02211929, 0.02003734, 0.020160723, 0.019793797, 0.021560937, 0.020854315, 0.018964814, 0.022586579, 0.020882761, 0.020756906, 0.020425903, 0.022862582, 0.021422645, 0.01978924, 0.01992046, 0.019339358, 0.020998849, 0.023534156, 0.019865932, 0.020309737, 0.02066338, 0.023085615, 0.019129628, 0.023270532, 0.020093733, 0.020029603, 0.019655418, 0.020938016, 0.02274478, 0.020513756, 0.020083273, 0.019546807, 0.022600917, 0.020128058, 0.020284569, 0.020008607, 0.019789668, 0.02106051, 0.020311773, 0.019099925, 0.019870846, 0.01933791, 0.02051634, 0.01991653]
2020-05-26 20:42:58,434 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:42:58,446 : INFO : topic #5 (0.018): 0.075*”offer” + 0.052*”cyclops” + 0.039*”dixons” + 0.036*”tender” + 0.026*”video” + 0.022*”cyacq” + 0.017*”pioneer” + 0.015*”citicorp” + 0.013*”offers” + 0.012*”cents”
2020-05-26 20:42:58,447 : INFO : topic #12 (0.019): 0.027*”minister” + 0.019*”ceiling” + 0.018*”venezuela” + 0.018*”official” + 0.016*”sector” + 0.015*”ecuador” + 0.013*”employers” + 0.013*”great” + 0.011*”union” + 0.010*”officials”
2020-05-26 20:42:58,450 : INFO : topic #27 (0.023): 0.020*”government” + 0.012*”foreign” + 0.009*”economic” + 0.006*”added” + 0.006*”growth” + 0.006*”guilders” + 0.006*”interest” + 0.006*”business” + 0.006*”chairman” + 0.006*”dutch”
2020-05-26 20:42:58,452 : INFO : topic #29 (0.023): 0.030*”credit” + 0.015*”stake” + 0.015*”copper” + 0.014*”pacific” + 0.013*”offer” + 0.013*”includes” + 0.012*”holdings” + 0.012*”holding” + 0.012*”canadian” + 0.011*”owned”
2020-05-26 20:42:58,455 : INFO : topic #23 (0.024): 0.120*”dividend” + 0.075*”prior” + 0.044*”quarterly” + 0.038*”payable” + 0.034*”payout” + 0.027*”common” + 0.026*”preferred” + 0.025*”holders” + 0.023*”shareholders” + 0.021*”regular”
2020-05-26 20:42:58,456 : INFO : topic diff=1.392128, rho=0.412235
2020-05-26 20:42:58,472 : INFO : PROGRESS: pass 1, at document #6000/7769
2020-05-26 20:42:59,503 : INFO : optimized alpha [0.021557983, 0.020089976, 0.020858383, 0.021856656, 0.020017654, 0.018343838, 0.022844829, 0.020159038, 0.020211177, 0.019974522, 0.021901142, 0.021058595, 0.019149793, 0.02342068, 0.02151089, 0.02108808, 0.020910645, 0.023575572, 0.021706682, 0.01998978, 0.020134129, 0.019421471, 0.02136133, 0.024490807, 0.020054806, 0.020589659, 0.021175643, 0.023726132, 0.019278381, 0.023742646, 0.020405492, 0.02015429, 0.019775776, 0.021205919, 0.02322315, 0.020850234, 0.020104203, 0.019639457, 0.023144394, 0.020309996, 0.020607576, 0.020166297, 0.019883206, 0.02142349, 0.020496653, 0.019226627, 0.020034594, 0.019391265, 0.020762946, 0.020090595]
2020-05-26 20:42:59,513 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:42:59,524 : INFO : topic #5 (0.018): 0.094*”offer” + 0.060*”cyclops” + 0.048*”tender” + 0.044*”dixons” + 0.027*”cyacq” + 0.023*”video” + 0.018*”offers” + 0.015*”tendered” + 0.015*”citicorp” + 0.012*”mills”
2020-05-26 20:42:59,526 : INFO : topic #12 (0.019): 0.031*”minister” + 0.022*”sector” + 0.022*”venezuela” + 0.020*”official” + 0.020*”employers” + 0.019*”ceiling” + 0.016*”ecuador” + 0.014*”union” + 0.013*”great” + 0.012*”strikes”
2020-05-26 20:42:59,527 : INFO : topic #27 (0.024): 0.025*”government” + 0.012*”foreign” + 0.008*”economic” + 0.007*”added” + 0.007*”growth” + 0.006*”payments” + 0.005*”chairman” + 0.005*”interest” + 0.005*”companies” + 0.005*”spending”
2020-05-26 20:42:59,528 : INFO : topic #29 (0.024): 0.034*”credit” + 0.023*”copper” + 0.018*”pacific” + 0.014*”stake” + 0.014*”holding” + 0.013*”includes” + 0.013*”holdings” + 0.011*”owned” + 0.011*”offer” + 0.010*”pretax”
2020-05-26 20:42:59,530 : INFO : topic #23 (0.024): 0.131*”dividend” + 0.092*”prior” + 0.045*”quarterly” + 0.044*”payout” + 0.040*”payable” + 0.028*”preferred” + 0.027*”common” + 0.026*”holders” + 0.023*”shareholders” + 0.022*”regular”
2020-05-26 20:42:59,531 : INFO : topic diff=1.375687, rho=0.412235
2020-05-26 20:43:00,852 : INFO : -7.923 per-word bound, 242.7 perplexity estimate based on a held-out corpus of 1769 documents with 75293 words
2020-05-26 20:43:00,853 : INFO : PROGRESS: pass 1, at document #7769/7769
2020-05-26 20:43:01,708 : INFO : optimized alpha [0.021974735, 0.020077819, 0.021140164, 0.022646153, 0.020311939, 0.018423146, 0.023498008, 0.02035662, 0.020341868, 0.020372499, 0.022492656, 0.02125195, 0.019282293, 0.023966871, 0.022087803, 0.021549916, 0.021257568, 0.0244132, 0.021929622, 0.02007311, 0.02039009, 0.019675542, 0.02196565, 0.025470423, 0.020214586, 0.021112697, 0.021779792, 0.024712034, 0.0194998, 0.0243442, 0.020778676, 0.020462973, 0.019996088, 0.021471133, 0.024053784, 0.021317318, 0.020483412, 0.01983469, 0.023904333, 0.020397482, 0.021005586, 0.020496579, 0.019956041, 0.0217294, 0.020641267, 0.019440837, 0.02009362, 0.019551722, 0.020937266, 0.020507636]
2020-05-26 20:43:01,718 : INFO : merging changes from 1769 documents into a model of 7769 documents
2020-05-26 20:43:01,730 : INFO : topic #5 (0.018): 0.106*”offer” + 0.088*”cyclops” + 0.066*”dixons” + 0.054*”tender” + 0.034*”cyacq” + 0.025*”citicorp” + 0.021*”video” + 0.016*”tendered” + 0.013*”conflict” + 0.013*”affiliates”
2020-05-26 20:43:01,731 : INFO : topic #12 (0.019): 0.039*”minister” + 0.023*”sector” + 0.022*”ceiling” + 0.021*”employers” + 0.019*”great” + 0.019*”venezuela” + 0.017*”official” + 0.013*”uruguay” + 0.012*”informal” + 0.012*”union”
2020-05-26 20:43:01,732 : INFO : topic #17 (0.024): 0.057*”earnings” + 0.031*”expects” + 0.019*”revenues” + 0.018*”financial” + 0.017*”quarter” + 0.014*”agreement” + 0.013*”reported” + 0.012*”common” + 0.011*”second” + 0.011*”transaction”
2020-05-26 20:43:01,733 : INFO : topic #27 (0.025): 0.026*”government” + 0.010*”foreign” + 0.008*”economic” + 0.007*”added” + 0.006*”growth” + 0.006*”chairman” + 0.006*”years” + 0.006*”payments” + 0.005*”companies” + 0.005*”financial”
2020-05-26 20:43:01,735 : INFO : topic #23 (0.025): 0.134*”dividend” + 0.097*”prior” + 0.050*”quarterly” + 0.046*”payout” + 0.041*”payable” + 0.035*”preferred” + 0.028*”common” + 0.024*”holders” + 0.024*”regular” + 0.022*”shareholders”
2020-05-26 20:43:01,741 : INFO : topic diff=1.345919, rho=0.412235
2020-05-26 20:43:01,760 : INFO : PROGRESS: pass 2, at document #2000/7769
2020-05-26 20:43:02,782 : INFO : optimized alpha [0.02232722, 0.020237189, 0.021165106, 0.022873197, 0.020599734, 0.018354136, 0.024071615, 0.020318395, 0.020400623, 0.020481562, 0.022859227, 0.021425415, 0.01916708, 0.02432541, 0.022480046, 0.021584185, 0.02152398, 0.025065502, 0.02189522, 0.02006201, 0.02067021, 0.019841554, 0.022005394, 0.025876738, 0.02032598, 0.021401485, 0.022227837, 0.024965866, 0.019457314, 0.024599206, 0.0208183, 0.020532135, 0.020051856, 0.021650191, 0.024285426, 0.021665236, 0.020551546, 0.019755812, 0.024182413, 0.020408243, 0.021291247, 0.020504046, 0.019938644, 0.0218102, 0.020768942, 0.019491166, 0.020197032, 0.01966955, 0.020929806, 0.02057427]
2020-05-26 20:43:02,790 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:02,805 : INFO : topic #5 (0.018): 0.121*”offer” + 0.071*”cyclops” + 0.057*”tender” + 0.055*”dixons” + 0.027*”cyacq” + 0.023*”video” + 0.020*”pioneer” + 0.019*”citicorp” + 0.019*”offers” + 0.017*”tendered”
2020-05-26 20:43:02,806 : INFO : topic #12 (0.019): 0.042*”minister” + 0.021*”sector” + 0.021*”official” + 0.019*”great” + 0.018*”venezuela” + 0.017*”ceiling” + 0.017*”employers” + 0.013*”union” + 0.013*”chirac” + 0.011*”labour”
2020-05-26 20:43:02,808 : INFO : topic #27 (0.025): 0.023*”government” + 0.009*”foreign” + 0.009*”economic” + 0.007*”added” + 0.006*”growth” + 0.006*”chairman” + 0.006*”business” + 0.006*”years” + 0.005*”financial” + 0.005*”companies”
2020-05-26 20:43:02,811 : INFO : topic #17 (0.025): 0.060*”earnings” + 0.033*”expects” + 0.020*”revenues” + 0.018*”financial” + 0.018*”quarter” + 0.013*”agreement” + 0.012*”reported” + 0.012*”operating” + 0.012*”transaction” + 0.012*”acquisition”
2020-05-26 20:43:02,813 : INFO : topic #23 (0.026): 0.144*”dividend” + 0.091*”prior” + 0.047*”quarterly” + 0.044*”payout” + 0.040*”payable” + 0.035*”preferred” + 0.029*”common” + 0.024*”regular” + 0.024*”shareholders” + 0.024*”holders”
2020-05-26 20:43:02,814 : INFO : topic diff=1.137404, rho=0.381122
2020-05-26 20:43:02,830 : INFO : PROGRESS: pass 2, at document #4000/7769
2020-05-26 20:43:03,861 : INFO : optimized alpha [0.022744829, 0.020472232, 0.021498824, 0.023329144, 0.020970812, 0.018327964, 0.024674352, 0.02057199, 0.020685004, 0.020806707, 0.023296125, 0.02174916, 0.0193012, 0.024918603, 0.02293601, 0.02193306, 0.021932224, 0.025557881, 0.022227895, 0.020284869, 0.020930186, 0.020061793, 0.022410257, 0.026582176, 0.020513976, 0.021781553, 0.022648476, 0.025549103, 0.01956267, 0.02514364, 0.02109776, 0.020837089, 0.02025341, 0.021812018, 0.02501084, 0.02201637, 0.020698586, 0.019991228, 0.024767142, 0.020626195, 0.021645311, 0.020654587, 0.020085433, 0.021978185, 0.021128718, 0.019721435, 0.020541068, 0.019833032, 0.021163715, 0.020695362]
2020-05-26 20:43:03,870 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:03,885 : INFO : topic #5 (0.018): 0.141*”offer” + 0.062*”tender” + 0.053*”cyclops” + 0.038*”dixons” + 0.030*”video” + 0.026*”offers” + 0.020*”cyacq” + 0.017*”tendered” + 0.016*”pioneer” + 0.015*”citicorp”
2020-05-26 20:43:03,887 : INFO : topic #12 (0.019): 0.046*”minister” + 0.026*”official” + 0.025*”venezuela” + 0.021*”ceiling” + 0.019*”sector” + 0.017*”great” + 0.015*”ecuador” + 0.015*”employers” + 0.014*”union” + 0.012*”labour”
2020-05-26 20:43:03,889 : INFO : topic #27 (0.026): 0.025*”government” + 0.010*”foreign” + 0.010*”economic” + 0.008*”growth” + 0.007*”added” + 0.007*”chairman” + 0.006*”years” + 0.006*”guilders” + 0.006*”reduce” + 0.006*”dutch”
2020-05-26 20:43:03,892 : INFO : topic #17 (0.026): 0.058*”earnings” + 0.032*”expects” + 0.020*”revenues” + 0.016*”quarter” + 0.016*”financial” + 0.013*”agreement” + 0.013*”reported” + 0.012*”operating” + 0.012*”acquisition” + 0.012*”common”
2020-05-26 20:43:03,897 : INFO : topic #23 (0.027): 0.145*”dividend” + 0.086*”prior” + 0.054*”quarterly” + 0.043*”payable” + 0.039*”payout” + 0.033*”preferred” + 0.029*”common” + 0.029*”holders” + 0.024*”regular” + 0.023*”distribution”
2020-05-26 20:43:03,900 : INFO : topic diff=1.058529, rho=0.381122
2020-05-26 20:43:03,919 : INFO : PROGRESS: pass 2, at document #6000/7769
2020-05-26 20:43:04,900 : INFO : optimized alpha [0.023188017, 0.020569315, 0.021872016, 0.023890095, 0.021295931, 0.018517684, 0.025427144, 0.020718504, 0.020761084, 0.020991199, 0.023691023, 0.021936746, 0.01955401, 0.025752641, 0.023658495, 0.022286199, 0.022459611, 0.026207766, 0.02247557, 0.020489216, 0.021189185, 0.020244699, 0.022751428, 0.027517095, 0.02071881, 0.022242105, 0.023295032, 0.0263259, 0.019802473, 0.025538418, 0.021433767, 0.02100866, 0.020415649, 0.022167053, 0.025601843, 0.02234347, 0.020723656, 0.020141633, 0.025292754, 0.020799613, 0.02203862, 0.020877825, 0.020201473, 0.022370916, 0.021316284, 0.019893508, 0.020738367, 0.019872354, 0.02142533, 0.020923568]
2020-05-26 20:43:04,910 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:04,923 : INFO : topic #5 (0.019): 0.156*”offer” + 0.070*”tender” + 0.054*”cyclops” + 0.039*”dixons” + 0.027*”offers” + 0.024*”video” + 0.023*”cyacq” + 0.019*”tendered” + 0.016*”outstanding” + 0.015*”offered”
2020-05-26 20:43:04,924 : INFO : topic #12 (0.020): 0.047*”minister” + 0.028*”official” + 0.026*”venezuela” + 0.024*”sector” + 0.021*”ceiling” + 0.020*”employers” + 0.016*”ecuador” + 0.016*”labour” + 0.016*”great” + 0.015*”union”
2020-05-26 20:43:04,925 : INFO : topic #17 (0.026): 0.061*”earnings” + 0.033*”expects” + 0.021*”revenues” + 0.018*”financial” + 0.015*”quarter” + 0.013*”agreement” + 0.012*”acquisition” + 0.012*”reported” + 0.012*”transaction” + 0.012*”common”
2020-05-26 20:43:04,927 : INFO : topic #27 (0.026): 0.029*”government” + 0.010*”foreign” + 0.008*”economic” + 0.008*”growth” + 0.008*”added” + 0.007*”years” + 0.006*”chairman” + 0.006*”spending” + 0.006*”financial” + 0.006*”payments”
2020-05-26 20:43:04,930 : INFO : topic #23 (0.028): 0.152*”dividend” + 0.101*”prior” + 0.053*”quarterly” + 0.048*”payout” + 0.044*”payable” + 0.034*”preferred” + 0.029*”holders” + 0.028*”common” + 0.024*”regular” + 0.022*”declared”
2020-05-26 20:43:04,933 : INFO : topic diff=0.962829, rho=0.381122
2020-05-26 20:43:06,156 : INFO : -7.810 per-word bound, 224.4 perplexity estimate based on a held-out corpus of 1769 documents with 75293 words
2020-05-26 20:43:06,157 : INFO : PROGRESS: pass 2, at document #7769/7769
2020-05-26 20:43:07,066 : INFO : optimized alpha [0.023554355, 0.020576214, 0.022180047, 0.024690537, 0.021654444, 0.01868897, 0.026080897, 0.020931466, 0.020954924, 0.021471275, 0.024305958, 0.022127923, 0.019762417, 0.026265705, 0.024308993, 0.022723382, 0.02287808, 0.026900576, 0.022641493, 0.020617198, 0.021509852, 0.020548558, 0.023341497, 0.028489161, 0.020935517, 0.02286884, 0.023977038, 0.027341492, 0.020058654, 0.02608483, 0.021873195, 0.021327369, 0.020653123, 0.022484466, 0.026471797, 0.022835013, 0.021064224, 0.020348974, 0.026029438, 0.020878244, 0.022557188, 0.021238312, 0.0202891, 0.02260563, 0.021432057, 0.020154586, 0.02079408, 0.020038158, 0.021635287, 0.021344498]
2020-05-26 20:43:07,075 : INFO : merging changes from 1769 documents into a model of 7769 documents
2020-05-26 20:43:07,089 : INFO : topic #5 (0.019): 0.152*”offer” + 0.076*”cyclops” + 0.074*”tender” + 0.057*”dixons” + 0.029*”cyacq” + 0.022*”citicorp” + 0.021*”video” + 0.018*”offers” + 0.018*”tendered” + 0.015*”outstanding”
2020-05-26 20:43:07,090 : INFO : topic #12 (0.020): 0.052*”minister” + 0.023*”official” + 0.022*”ceiling” + 0.022*”sector” + 0.022*”venezuela” + 0.021*”great” + 0.020*”employers” + 0.014*”labour” + 0.013*”uruguay” + 0.013*”union”
2020-05-26 20:43:07,091 : INFO : topic #17 (0.027): 0.061*”earnings” + 0.034*”expects” + 0.021*”revenues” + 0.020*”financial” + 0.014*”agreement” + 0.013*”reported” + 0.012*”quarter” + 0.012*”transaction” + 0.012*”business” + 0.011*”earned”
2020-05-26 20:43:07,093 : INFO : topic #27 (0.027): 0.029*”government” + 0.009*”economic” + 0.009*”foreign” + 0.008*”growth” + 0.008*”years” + 0.008*”added” + 0.007*”chairman” + 0.006*”financial” + 0.006*”payments” + 0.006*”reduce”
2020-05-26 20:43:07,096 : INFO : topic #23 (0.028): 0.153*”dividend” + 0.104*”prior” + 0.057*”quarterly” + 0.049*”payout” + 0.045*”payable” + 0.040*”preferred” + 0.028*”common” + 0.027*”holders” + 0.026*”regular” + 0.021*”declared”
2020-05-26 20:43:07,100 : INFO : topic diff=0.886458, rho=0.381122
2020-05-26 20:43:07,117 : INFO : PROGRESS: pass 3, at document #2000/7769
2020-05-26 20:43:08,135 : INFO : optimized alpha [0.02400031, 0.020745924, 0.02222391, 0.024955867, 0.022019913, 0.018707857, 0.026688097, 0.020949997, 0.02109447, 0.021585194, 0.024696302, 0.02233341, 0.019731276, 0.026634129, 0.024732895, 0.022777181, 0.023208696, 0.027549624, 0.022629118, 0.02064241, 0.021890024, 0.020777868, 0.023380717, 0.028875187, 0.021080483, 0.02330315, 0.024483852, 0.027616695, 0.020052731, 0.026361562, 0.021990722, 0.021412961, 0.020770375, 0.022745557, 0.026769327, 0.02322815, 0.021147447, 0.020361627, 0.026306948, 0.020920303, 0.022960966, 0.021312518, 0.020313788, 0.022689147, 0.02158208, 0.0202788, 0.020912418, 0.020176744, 0.021691257, 0.021456795]
2020-05-26 20:43:08,144 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:08,157 : INFO : topic #5 (0.019): 0.164*”offer” + 0.073*”tender” + 0.062*”cyclops” + 0.048*”dixons” + 0.023*”cyacq” + 0.022*”offers” + 0.021*”video” + 0.017*”pioneer” + 0.017*”tendered” + 0.017*”citicorp”
2020-05-26 20:43:08,158 : INFO : topic #12 (0.020): 0.054*”minister” + 0.025*”official” + 0.021*”great” + 0.021*”venezuela” + 0.020*”sector” + 0.017*”ceiling” + 0.016*”employers” + 0.014*”union” + 0.013*”labour” + 0.013*”singapore”
2020-05-26 20:43:08,159 : INFO : topic #17 (0.028): 0.064*”earnings” + 0.037*”expects” + 0.022*”revenues” + 0.020*”financial” + 0.013*”transaction” + 0.012*”business” + 0.012*”operating” + 0.012*”agreement” + 0.012*”quarter” + 0.012*”acquisition”
2020-05-26 20:43:08,161 : INFO : topic #27 (0.028): 0.026*”government” + 0.009*”economic” + 0.008*”foreign” + 0.008*”added” + 0.007*”growth” + 0.007*”years” + 0.007*”chairman” + 0.006*”financial” + 0.006*”reduce” + 0.005*”companies”
2020-05-26 20:43:08,162 : INFO : topic #23 (0.029): 0.162*”dividend” + 0.097*”prior” + 0.054*”quarterly” + 0.047*”payout” + 0.043*”payable” + 0.040*”preferred” + 0.029*”common” + 0.026*”regular” + 0.026*”holders” + 0.023*”declared”
2020-05-26 20:43:08,165 : INFO : topic diff=0.728371, rho=0.356134
2020-05-26 20:43:08,182 : INFO : PROGRESS: pass 3, at document #4000/7769
2020-05-26 20:43:09,316 : INFO : optimized alpha [0.024459107, 0.020983534, 0.022543393, 0.02550122, 0.022487232, 0.018781416, 0.027304053, 0.021179494, 0.021421446, 0.021938864, 0.025148764, 0.022680394, 0.0199201, 0.027234133, 0.025218377, 0.023129964, 0.02366407, 0.028001843, 0.022908147, 0.020854244, 0.022151515, 0.021021323, 0.02374337, 0.029611167, 0.021271965, 0.023813166, 0.024913015, 0.02835601, 0.020242274, 0.026914358, 0.02231135, 0.021746831, 0.021001406, 0.022952218, 0.027601609, 0.023568023, 0.021302775, 0.020681001, 0.026868716, 0.021163557, 0.02337184, 0.0214974, 0.020483391, 0.022860557, 0.021963798, 0.020532176, 0.02127833, 0.020366043, 0.02199538, 0.02160109]
2020-05-26 20:43:09,325 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:09,339 : INFO : topic #5 (0.019): 0.179*”offer” + 0.077*”tender” + 0.047*”cyclops” + 0.033*”dixons” + 0.028*”offers” + 0.027*”video” + 0.019*”offered” + 0.017*”cyacq” + 0.017*”outstanding” + 0.017*”tendered”
2020-05-26 20:43:09,340 : INFO : topic #12 (0.020): 0.057*”minister” + 0.030*”official” + 0.027*”venezuela” + 0.020*”ceiling” + 0.019*”great” + 0.017*”sector” + 0.016*”ecuador” + 0.014*”employers” + 0.014*”labour” + 0.014*”union”
2020-05-26 20:43:09,341 : INFO : topic #17 (0.028): 0.062*”earnings” + 0.035*”expects” + 0.022*”revenues” + 0.018*”financial” + 0.013*”business” + 0.012*”operating” + 0.012*”earned” + 0.012*”transaction” + 0.012*”reported” + 0.012*”agreement”
2020-05-26 20:43:09,342 : INFO : topic #27 (0.028): 0.028*”government” + 0.010*”economic” + 0.009*”growth” + 0.009*”foreign” + 0.008*”added” + 0.007*”years” + 0.007*”chairman” + 0.006*”reduce” + 0.006*”financial” + 0.005*”could”
2020-05-26 20:43:09,344 : INFO : topic #23 (0.030): 0.162*”dividend” + 0.093*”prior” + 0.059*”quarterly” + 0.045*”payable” + 0.041*”payout” + 0.037*”preferred” + 0.031*”holders” + 0.028*”common” + 0.026*”distribution” + 0.025*”regular”
2020-05-26 20:43:09,345 : INFO : topic diff=0.658831, rho=0.356134
2020-05-26 20:43:09,363 : INFO : PROGRESS: pass 3, at document #6000/7769
2020-05-26 20:43:10,356 : INFO : optimized alpha [0.024941342, 0.021094536, 0.022898313, 0.02611038, 0.022830004, 0.019009197, 0.02810751, 0.02130682, 0.021573234, 0.022162545, 0.025536895, 0.022858879, 0.020194182, 0.028049277, 0.025880996, 0.023512417, 0.024293672, 0.028608307, 0.0231394, 0.021078777, 0.022413014, 0.021222008, 0.024059216, 0.030547071, 0.02148491, 0.024437686, 0.02560379, 0.029172651, 0.020503316, 0.027249495, 0.022672715, 0.021928418, 0.021180093, 0.023366986, 0.028229933, 0.02391367, 0.021321185, 0.020894278, 0.027385361, 0.021344874, 0.023814764, 0.021704944, 0.02058959, 0.023268173, 0.022155523, 0.020722996, 0.021496173, 0.020439887, 0.022291351, 0.021885138]
2020-05-26 20:43:10,366 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:10,378 : INFO : topic #5 (0.019): 0.184*”offer” + 0.080*”tender” + 0.047*”cyclops” + 0.034*”dixons” + 0.027*”offers” + 0.022*”video” + 0.020*”cyacq” + 0.019*”offered” + 0.018*”outstanding” + 0.018*”tendered”
2020-05-26 20:43:10,379 : INFO : topic #12 (0.020): 0.056*”minister” + 0.033*”official” + 0.027*”venezuela” + 0.022*”sector” + 0.020*”ceiling” + 0.019*”employers” + 0.018*”great” + 0.017*”labour” + 0.016*”ecuador” + 0.014*”union”
2020-05-26 20:43:10,383 : INFO : topic #17 (0.029): 0.064*”earnings” + 0.036*”expects” + 0.023*”revenues” + 0.020*”financial” + 0.013*”business” + 0.012*”transaction” + 0.011*”operating” + 0.011*”earned” + 0.011*”agreement” + 0.011*”acquisition”
2020-05-26 20:43:10,389 : INFO : topic #27 (0.029): 0.031*”government” + 0.009*”added” + 0.009*”growth” + 0.009*”foreign” + 0.009*”economic” + 0.008*”years” + 0.007*”spending” + 0.007*”financial” + 0.007*”chairman” + 0.006*”reduce”
2020-05-26 20:43:10,394 : INFO : topic #23 (0.031): 0.166*”dividend” + 0.105*”prior” + 0.058*”quarterly” + 0.049*”payout” + 0.046*”payable” + 0.037*”preferred” + 0.031*”holders” + 0.026*”common” + 0.025*”regular” + 0.024*”declared”
2020-05-26 20:43:10,396 : INFO : topic diff=0.583393, rho=0.356134
2020-05-26 20:43:11,637 : INFO : -7.750 per-word bound, 215.3 perplexity estimate based on a held-out corpus of 1769 documents with 75293 words
2020-05-26 20:43:11,638 : INFO : PROGRESS: pass 3, at document #7769/7769
2020-05-26 20:43:12,608 : INFO : optimized alpha [0.02533323, 0.021122456, 0.023182588, 0.026904428, 0.02322979, 0.019224787, 0.028770663, 0.021535441, 0.021838455, 0.022599038, 0.026148938, 0.023050591, 0.020396112, 0.028549401, 0.026546802, 0.023930615, 0.024775684, 0.02927307, 0.023286302, 0.021225186, 0.022746084, 0.021521129, 0.024633285, 0.03150169, 0.02175445, 0.025091248, 0.026302483, 0.030290743, 0.02075895, 0.027766539, 0.023133012, 0.022206392, 0.021455202, 0.023706423, 0.029135408, 0.02441375, 0.021642657, 0.021149542, 0.028126864, 0.021437192, 0.02443344, 0.022051638, 0.020670747, 0.023497371, 0.022287698, 0.021007527, 0.021540027, 0.020618677, 0.022538623, 0.022317057]
2020-05-26 20:43:12,617 : INFO : merging changes from 1769 documents into a model of 7769 documents
2020-05-26 20:43:12,630 : INFO : topic #5 (0.019): 0.174*”offer” + 0.083*”tender” + 0.068*”cyclops” + 0.050*”dixons” + 0.026*”cyacq” + 0.019*”citicorp” + 0.019*”video” + 0.019*”offers” + 0.017*”outstanding” + 0.017*”tendered”
2020-05-26 20:43:12,632 : INFO : topic #12 (0.020): 0.060*”minister” + 0.027*”official” + 0.023*”venezuela” + 0.022*”great” + 0.022*”ceiling” + 0.020*”sector” + 0.020*”employers” + 0.015*”labour” + 0.012*”uruguay” + 0.012*”ecuador”
2020-05-26 20:43:12,633 : INFO : topic #17 (0.029): 0.064*”earnings” + 0.037*”expects” + 0.023*”revenues” + 0.021*”financial” + 0.013*”business” + 0.012*”transaction” + 0.012*”earned” + 0.012*”reported” + 0.011*”operating” + 0.011*”agreement”
2020-05-26 20:43:12,634 : INFO : topic #27 (0.030): 0.031*”government” + 0.009*”economic” + 0.009*”growth” + 0.009*”years” + 0.009*”added” + 0.008*”foreign” + 0.007*”financial” + 0.007*”chairman” + 0.006*”development” + 0.006*”spending”
2020-05-26 20:43:12,638 : INFO : topic #23 (0.032): 0.165*”dividend” + 0.107*”prior” + 0.061*”quarterly” + 0.050*”payout” + 0.046*”payable” + 0.042*”preferred” + 0.028*”holders” + 0.027*”regular” + 0.027*”common” + 0.024*”distribution”
2020-05-26 20:43:12,640 : INFO : topic diff=0.531672, rho=0.356134
2020-05-26 20:43:12,659 : INFO : PROGRESS: pass 4, at document #2000/7769
2020-05-26 20:43:13,739 : INFO : optimized alpha [0.025805598, 0.021312034, 0.023210708, 0.02714799, 0.023641303, 0.019294905, 0.029359307, 0.021577595, 0.022046601, 0.022732563, 0.026556848, 0.023271808, 0.020394504, 0.028907191, 0.026964132, 0.023988955, 0.025169889, 0.029896565, 0.023288561, 0.021261502, 0.023129392, 0.02175659, 0.024698531, 0.03183745, 0.021923596, 0.025693703, 0.026826072, 0.030515607, 0.020776337, 0.02804078, 0.023260413, 0.022319168, 0.021613728, 0.0239857, 0.029500354, 0.02482348, 0.021707483, 0.021225054, 0.028397854, 0.02150303, 0.024869911, 0.022158142, 0.02071171, 0.023576586, 0.022448746, 0.021164615, 0.021657286, 0.02077183, 0.022624413, 0.022451352]
2020-05-26 20:43:13,750 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:13,764 : INFO : topic #5 (0.019): 0.185*”offer” + 0.081*”tender” + 0.056*”cyclops” + 0.043*”dixons” + 0.022*”offers” + 0.021*”cyacq” + 0.020*”video” + 0.016*”offered” + 0.016*”tendered” + 0.016*”outstanding”
2020-05-26 20:43:13,765 : INFO : topic #12 (0.020): 0.061*”minister” + 0.030*”official” + 0.023*”venezuela” + 0.022*”great” + 0.018*”sector” + 0.017*”ceiling” + 0.016*”employers” + 0.014*”labour” + 0.013*”singapore” + 0.013*”union”
2020-05-26 20:43:13,766 : INFO : topic #17 (0.030): 0.067*”earnings” + 0.040*”expects” + 0.023*”revenues” + 0.021*”financial” + 0.014*”business” + 0.013*”earned” + 0.013*”transaction” + 0.012*”operating” + 0.011*”reported” + 0.011*”higher”
2020-05-26 20:43:13,768 : INFO : topic #27 (0.031): 0.028*”government” + 0.010*”economic” + 0.009*”added” + 0.008*”growth” + 0.008*”years” + 0.007*”foreign” + 0.007*”financial” + 0.007*”chairman” + 0.006*”reduce” + 0.006*”companies”
2020-05-26 20:43:13,769 : INFO : topic #23 (0.032): 0.173*”dividend” + 0.100*”prior” + 0.058*”quarterly” + 0.048*”payout” + 0.044*”payable” + 0.041*”preferred” + 0.028*”common” + 0.027*”regular” + 0.026*”holders” + 0.025*”declared”
2020-05-26 20:43:13,772 : INFO : topic diff=0.438397, rho=0.335493
2020-05-26 20:43:13,791 : INFO : PROGRESS: pass 4, at document #4000/7769
2020-05-26 20:43:14,923 : INFO : optimized alpha [0.026295414, 0.02153414, 0.023526248, 0.02773205, 0.024101833, 0.019437252, 0.029950634, 0.021811953, 0.022408372, 0.02311993, 0.027054679, 0.023631256, 0.020595899, 0.029502934, 0.02746314, 0.024339056, 0.025666801, 0.030345513, 0.023560578, 0.021467581, 0.023418278, 0.021988153, 0.025044275, 0.03252173, 0.022114579, 0.026289519, 0.027236922, 0.031286627, 0.021026544, 0.028596988, 0.023606969, 0.022671854, 0.021850562, 0.024214964, 0.030367753, 0.025144095, 0.02181271, 0.021578485, 0.028977117, 0.021759067, 0.025300276, 0.022338683, 0.020891346, 0.023732193, 0.02282363, 0.021437619, 0.022005001, 0.020992622, 0.022989014, 0.022616824]
2020-05-26 20:43:14,934 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:14,948 : INFO : topic #5 (0.019): 0.196*”offer” + 0.086*”tender” + 0.042*”cyclops” + 0.030*”dixons” + 0.028*”offers” + 0.024*”video” + 0.021*”offered” + 0.018*”outstanding” + 0.016*”cyacq” + 0.015*”tendered”
2020-05-26 20:43:14,949 : INFO : topic #12 (0.021): 0.063*”minister” + 0.035*”official” + 0.028*”venezuela” + 0.021*”great” + 0.020*”ceiling” + 0.017*”ecuador” + 0.016*”sector” + 0.015*”labour” + 0.014*”employers” + 0.013*”union”
2020-05-26 20:43:14,950 : INFO : topic #34 (0.030): 0.030*”prices” + 0.026*”world” + 0.014*”demand” + 0.014*”exports” + 0.012*”economic” + 0.011*”countries” + 0.011*”domestic” + 0.010*”government” + 0.009*”production” + 0.009*”economy”
2020-05-26 20:43:14,952 : INFO : topic #27 (0.031): 0.029*”government” + 0.010*”economic” + 0.010*”growth” + 0.009*”added” + 0.008*”years” + 0.008*”foreign” + 0.007*”chairman” + 0.007*”financial” + 0.006*”reduce” + 0.006*”spending”
2020-05-26 20:43:14,955 : INFO : topic #23 (0.033): 0.171*”dividend” + 0.096*”prior” + 0.062*”quarterly” + 0.047*”payable” + 0.043*”payout” + 0.038*”preferred” + 0.031*”holders” + 0.028*”distribution” + 0.027*”common” + 0.026*”regular”
2020-05-26 20:43:14,957 : INFO : topic diff=0.397503, rho=0.335493
2020-05-26 20:43:14,975 : INFO : PROGRESS: pass 4, at document #6000/7769
2020-05-26 20:43:15,971 : INFO : optimized alpha [0.026837012, 0.021648737, 0.023910053, 0.028325949, 0.024491569, 0.01967654, 0.030768579, 0.021944402, 0.022628304, 0.02335544, 0.027433736, 0.023810148, 0.02090547, 0.030336449, 0.028128328, 0.024719227, 0.026352465, 0.030910576, 0.023802731, 0.021690182, 0.023691148, 0.022180386, 0.025379298, 0.033443566, 0.022349948, 0.02702957, 0.027865438, 0.03212987, 0.021308042, 0.028922658, 0.023979092, 0.022853099, 0.02205168, 0.02464408, 0.030998165, 0.025450446, 0.02184565, 0.02181016, 0.029477786, 0.021955725, 0.025739847, 0.022545898, 0.020978263, 0.024141146, 0.0230007, 0.021656817, 0.02221335, 0.021062262, 0.023306515, 0.02293276]
2020-05-26 20:43:15,980 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:15,991 : INFO : topic #5 (0.020): 0.196*”offer” + 0.086*”tender” + 0.044*”cyclops” + 0.031*”dixons” + 0.027*”offers” + 0.021*”offered” + 0.020*”video” + 0.018*”outstanding” + 0.018*”cyacq” + 0.017*”tendered”
2020-05-26 20:43:15,992 : INFO : topic #12 (0.021): 0.062*”minister” + 0.039*”official” + 0.028*”venezuela” + 0.020*”ceiling” + 0.019*”sector” + 0.019*”employers” + 0.018*”great” + 0.018*”ecuador” + 0.018*”labour” + 0.012*”union”
2020-05-26 20:43:15,994 : INFO : topic #34 (0.031): 0.031*”prices” + 0.024*”world” + 0.015*”demand” + 0.013*”exports” + 0.012*”domestic” + 0.011*”economic” + 0.011*”countries” + 0.010*”government” + 0.010*”production” + 0.010*”taiwan”
2020-05-26 20:43:15,996 : INFO : topic #27 (0.032): 0.032*”government” + 0.010*”added” + 0.010*”growth” + 0.009*”economic” + 0.009*”years” + 0.008*”foreign” + 0.008*”budget” + 0.007*”financial” + 0.007*”spending” + 0.007*”chairman”
2020-05-26 20:43:15,999 : INFO : topic #23 (0.033): 0.174*”dividend” + 0.107*”prior” + 0.060*”quarterly” + 0.050*”payout” + 0.047*”payable” + 0.038*”preferred” + 0.031*”holders” + 0.026*”regular” + 0.026*”declared” + 0.025*”common”
2020-05-26 20:43:16,001 : INFO : topic diff=0.353469, rho=0.335493
2020-05-26 20:43:17,240 : INFO : -7.714 per-word bound, 210.0 perplexity estimate based on a held-out corpus of 1769 documents with 75293 words
2020-05-26 20:43:17,241 : INFO : PROGRESS: pass 4, at document #7769/7769
2020-05-26 20:43:18,138 : INFO : optimized alpha [0.027224822, 0.021686986, 0.024201637, 0.0291686, 0.024873134, 0.019917388, 0.03142016, 0.02216736, 0.022931106, 0.023814585, 0.028014546, 0.023992024, 0.021141952, 0.030785741, 0.028763775, 0.025146479, 0.026835581, 0.031586014, 0.023946866, 0.021850081, 0.02405311, 0.022478405, 0.025950288, 0.034373257, 0.022661766, 0.027706502, 0.02853493, 0.033311043, 0.021588027, 0.029475804, 0.024433792, 0.02311182, 0.02234256, 0.02500206, 0.031897087, 0.025884109, 0.022160092, 0.02207395, 0.030148912, 0.02204351, 0.026364107, 0.022903163, 0.021074621, 0.024356997, 0.023069357, 0.021952718, 0.022251429, 0.021245293, 0.023589395, 0.023351291]
2020-05-26 20:43:18,147 : INFO : merging changes from 1769 documents into a model of 7769 documents
2020-05-26 20:43:18,159 : INFO : topic #5 (0.020): 0.182*”offer” + 0.088*”tender” + 0.062*”cyclops” + 0.046*”dixons” + 0.024*”cyacq” + 0.020*”offers” + 0.019*”offered” + 0.018*”video” + 0.018*”citicorp” + 0.017*”outstanding”
2020-05-26 20:43:18,161 : INFO : topic #42 (0.021): 0.126*”deficit” + 0.107*”surplus” + 0.064*”account” + 0.054*”current” + 0.048*”rubber” + 0.030*”february” + 0.026*”balance” + 0.025*”january” + 0.023*”allegheny” + 0.022*”hughes”
2020-05-26 20:43:18,162 : INFO : topic #34 (0.032): 0.029*”prices” + 0.026*”world” + 0.015*”demand” + 0.012*”exports” + 0.012*”economic” + 0.012*”countries” + 0.012*”domestic” + 0.011*”economy” + 0.010*”government” + 0.009*”production”
2020-05-26 20:43:18,165 : INFO : topic #27 (0.033): 0.032*”government” + 0.010*”economic” + 0.010*”growth” + 0.009*”years” + 0.009*”added” + 0.007*”financial” + 0.007*”chairman” + 0.007*”budget” + 0.007*”foreign” + 0.006*”spending”
2020-05-26 20:43:18,167 : INFO : topic #23 (0.034): 0.172*”dividend” + 0.110*”prior” + 0.064*”quarterly” + 0.051*”payout” + 0.047*”payable” + 0.042*”preferred” + 0.028*”holders” + 0.027*”regular” + 0.026*”distribution” + 0.026*”common”
2020-05-26 20:43:18,168 : INFO : topic diff=0.325971, rho=0.335493
2020-05-26 20:43:18,185 : INFO : PROGRESS: pass 5, at document #2000/7769
2020-05-26 20:43:19,231 : INFO : optimized alpha [0.027746178, 0.021885695, 0.02423049, 0.029451668, 0.025312915, 0.02002393, 0.03202087, 0.02222724, 0.023155617, 0.023935538, 0.02840087, 0.024212884, 0.021141188, 0.031130975, 0.02919903, 0.025225412, 0.027261974, 0.032185785, 0.023919707, 0.02189632, 0.024464672, 0.022705827, 0.026000798, 0.034697454, 0.022845812, 0.028403442, 0.029051019, 0.03355434, 0.02163301, 0.02976842, 0.024556838, 0.023225559, 0.02249792, 0.025287641, 0.032256, 0.026279636, 0.022205036, 0.02216544, 0.030391753, 0.022119772, 0.026780078, 0.022992963, 0.021140242, 0.024446836, 0.02323941, 0.022113375, 0.022361742, 0.021385606, 0.023705028, 0.023510732]
2020-05-26 20:43:19,240 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:19,254 : INFO : topic #5 (0.020): 0.193*”offer” + 0.087*”tender” + 0.052*”cyclops” + 0.040*”dixons” + 0.022*”offers” + 0.019*”cyacq” + 0.019*”video” + 0.018*”offered” + 0.016*”outstanding” + 0.015*”tendered”
2020-05-26 20:43:19,255 : INFO : topic #42 (0.021): 0.122*”deficit” + 0.105*”surplus” + 0.067*”account” + 0.055*”current” + 0.043*”rubber” + 0.033*”february” + 0.028*”january” + 0.026*”balance” + 0.021*”payments” + 0.021*”malaysia”
2020-05-26 20:43:19,256 : INFO : topic #34 (0.032): 0.030*”prices” + 0.027*”world” + 0.015*”demand” + 0.013*”exports” + 0.012*”domestic” + 0.012*”economic” + 0.011*”countries” + 0.010*”economy” + 0.010*”government” + 0.009*”industry”
2020-05-26 20:43:19,257 : INFO : topic #27 (0.034): 0.029*”government” + 0.010*”economic” + 0.010*”added” + 0.009*”growth” + 0.009*”years” + 0.007*”financial” + 0.007*”chairman” + 0.007*”budget” + 0.007*”foreign” + 0.006*”spending”
2020-05-26 20:43:19,259 : INFO : topic #23 (0.035): 0.181*”dividend” + 0.103*”prior” + 0.060*”quarterly” + 0.049*”payout” + 0.045*”payable” + 0.041*”preferred” + 0.027*”regular” + 0.027*”holders” + 0.027*”common” + 0.026*”declared”
2020-05-26 20:43:19,260 : INFO : topic diff=0.274910, rho=0.318070
2020-05-26 20:43:19,278 : INFO : PROGRESS: pass 5, at document #4000/7769
2020-05-26 20:43:20,339 : INFO : optimized alpha [0.028251685, 0.022108223, 0.024543593, 0.030067537, 0.025794152, 0.020185055, 0.03261711, 0.02244939, 0.023555456, 0.024319498, 0.028883727, 0.02455838, 0.021351913, 0.0317095, 0.029713439, 0.025590267, 0.027767256, 0.03265581, 0.02418637, 0.022111934, 0.02475768, 0.022938525, 0.026331522, 0.035344213, 0.0230508, 0.0290562, 0.029459862, 0.034317326, 0.021906769, 0.030367915, 0.024882702, 0.023564694, 0.022741057, 0.025511337, 0.033148617, 0.02655337, 0.022325505, 0.022581633, 0.030961316, 0.022367625, 0.027240673, 0.023174148, 0.021318976, 0.024601465, 0.023624104, 0.022381136, 0.0226908, 0.021598756, 0.024102187, 0.023686206]
2020-05-26 20:43:20,351 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:20,365 : INFO : topic #5 (0.020): 0.201*”offer” + 0.091*”tender” + 0.040*”cyclops” + 0.029*”dixons” + 0.027*”offers” + 0.023*”offered” + 0.022*”video” + 0.018*”outstanding” + 0.015*”cyacq” + 0.015*”tendered”
2020-05-26 20:43:20,368 : INFO : topic #42 (0.021): 0.117*”deficit” + 0.106*”surplus” + 0.063*”account” + 0.049*”current” + 0.043*”rubber” + 0.039*”hughes” + 0.028*”february” + 0.026*”january” + 0.025*”balance” + 0.023*”supermarkets”
2020-05-26 20:43:20,372 : INFO : topic #34 (0.033): 0.031*”prices” + 0.027*”world” + 0.014*”demand” + 0.013*”exports” + 0.012*”economic” + 0.012*”domestic” + 0.011*”countries” + 0.010*”government” + 0.009*”economy” + 0.009*”production”
2020-05-26 20:43:20,374 : INFO : topic #27 (0.034): 0.031*”government” + 0.011*”economic” + 0.010*”growth” + 0.010*”added” + 0.009*”years” + 0.008*”budget” + 0.007*”chairman” + 0.007*”financial” + 0.007*”foreign” + 0.007*”reduce”
2020-05-26 20:43:20,376 : INFO : topic #23 (0.035): 0.179*”dividend” + 0.098*”prior” + 0.064*”quarterly” + 0.047*”payable” + 0.044*”payout” + 0.039*”preferred” + 0.031*”holders” + 0.029*”distribution” + 0.026*”common” + 0.026*”declared”
2020-05-26 20:43:20,378 : INFO : topic diff=0.253403, rho=0.318070
2020-05-26 20:43:20,397 : INFO : PROGRESS: pass 5, at document #6000/7769
2020-05-26 20:43:21,404 : INFO : optimized alpha [0.028815404, 0.022215085, 0.024913985, 0.030660862, 0.026184175, 0.02042259, 0.03344104, 0.022590421, 0.02381149, 0.024567805, 0.029246198, 0.02473088, 0.021665564, 0.03253426, 0.030352851, 0.025982141, 0.028485887, 0.033201993, 0.024412753, 0.022325953, 0.025063476, 0.023112152, 0.026669843, 0.03623437, 0.023308007, 0.029864356, 0.030079462, 0.035194393, 0.022180306, 0.030673495, 0.025267594, 0.023743024, 0.022935033, 0.025944186, 0.033782046, 0.02685069, 0.022364918, 0.022812843, 0.03144891, 0.02256108, 0.027682295, 0.023407934, 0.021421483, 0.025021436, 0.02381922, 0.02259229, 0.022888059, 0.02165758, 0.024437353, 0.023999836]
2020-05-26 20:43:21,413 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:21,428 : INFO : topic #5 (0.020): 0.199*”offer” + 0.089*”tender” + 0.041*”cyclops” + 0.030*”dixons” + 0.027*”offers” + 0.022*”offered” + 0.019*”video” + 0.019*”outstanding” + 0.017*”cyacq” + 0.016*”tendered”
2020-05-26 20:43:21,429 : INFO : topic #42 (0.021): 0.129*”deficit” + 0.100*”surplus” + 0.062*”account” + 0.053*”current” + 0.045*”rubber” + 0.030*”hughes” + 0.028*”allegheny” + 0.027*”february” + 0.027*”balance” + 0.025*”january”
2020-05-26 20:43:21,430 : INFO : topic #34 (0.034): 0.032*”prices” + 0.025*”world” + 0.015*”demand” + 0.013*”exports” + 0.012*”domestic” + 0.012*”economic” + 0.010*”countries” + 0.010*”production” + 0.010*”government” + 0.010*”economy”
2020-05-26 20:43:21,435 : INFO : topic #27 (0.035): 0.033*”government” + 0.011*”added” + 0.011*”growth” + 0.010*”economic” + 0.010*”budget” + 0.009*”years” + 0.008*”financial” + 0.007*”foreign” + 0.007*”spending” + 0.007*”chairman”
2020-05-26 20:43:21,438 : INFO : topic #23 (0.036): 0.180*”dividend” + 0.109*”prior” + 0.062*”quarterly” + 0.051*”payout” + 0.048*”payable” + 0.039*”preferred” + 0.031*”holders” + 0.027*”declared” + 0.026*”regular” + 0.025*”common”
2020-05-26 20:43:21,441 : INFO : topic diff=0.229247, rho=0.318070
2020-05-26 20:43:22,677 : INFO : -7.691 per-word bound, 206.6 perplexity estimate based on a held-out corpus of 1769 documents with 75293 words
2020-05-26 20:43:22,678 : INFO : PROGRESS: pass 5, at document #7769/7769
2020-05-26 20:43:23,596 : INFO : optimized alpha [0.029194402, 0.02225086, 0.025196321, 0.03153375, 0.026590815, 0.020684227, 0.03405145, 0.022817116, 0.024125572, 0.025033733, 0.029871937, 0.024902679, 0.02187283, 0.033011254, 0.030975698, 0.026423631, 0.02898579, 0.033872742, 0.024555068, 0.022486992, 0.025412932, 0.023393579, 0.027210562, 0.037147902, 0.023624081, 0.03061878, 0.030725919, 0.03640878, 0.022452166, 0.031187661, 0.025732787, 0.023993468, 0.023196653, 0.026298176, 0.03471781, 0.027275287, 0.022661496, 0.023084562, 0.0320745, 0.022660853, 0.028333515, 0.023773244, 0.021542799, 0.025217107, 0.023849385, 0.02287993, 0.022925273, 0.021837363, 0.024723671, 0.02441129]
2020-05-26 20:43:23,606 : INFO : merging changes from 1769 documents into a model of 7769 documents
2020-05-26 20:43:23,618 : INFO : topic #5 (0.021): 0.186*”offer” + 0.090*”tender” + 0.059*”cyclops” + 0.044*”dixons” + 0.022*”cyacq” + 0.020*”offers” + 0.019*”offered” + 0.018*”outstanding” + 0.017*”video” + 0.017*”citicorp”
2020-05-26 20:43:23,621 : INFO : topic #42 (0.022): 0.127*”deficit” + 0.110*”surplus” + 0.065*”account” + 0.057*”current” + 0.048*”rubber” + 0.030*”february” + 0.027*”balance” + 0.026*”january” + 0.024*”exports” + 0.023*”allegheny”
2020-05-26 20:43:23,623 : INFO : topic #34 (0.035): 0.030*”prices” + 0.026*”world” + 0.015*”demand” + 0.012*”exports” + 0.012*”economic” + 0.012*”domestic” + 0.011*”countries” + 0.011*”economy” + 0.010*”government” + 0.010*”production”
2020-05-26 20:43:23,625 : INFO : topic #27 (0.036): 0.033*”government” + 0.010*”growth” + 0.010*”economic” + 0.010*”years” + 0.010*”added” + 0.009*”budget” + 0.008*”financial” + 0.007*”chairman” + 0.007*”spending” + 0.006*”foreign”
2020-05-26 20:43:23,629 : INFO : topic #23 (0.037): 0.178*”dividend” + 0.111*”prior” + 0.065*”quarterly” + 0.051*”payout” + 0.048*”payable” + 0.043*”preferred” + 0.028*”holders” + 0.028*”regular” + 0.027*”distribution” + 0.026*”declared”
2020-05-26 20:43:23,631 : INFO : topic diff=0.215484, rho=0.318070
2020-05-26 20:43:23,650 : INFO : PROGRESS: pass 6, at document #2000/7769
2020-05-26 20:43:24,679 : INFO : optimized alpha [0.02972531, 0.02245744, 0.02523411, 0.031832945, 0.02705556, 0.02080936, 0.03464905, 0.022862146, 0.024361948, 0.025143525, 0.030225443, 0.025106344, 0.02190166, 0.03333586, 0.031417076, 0.026502045, 0.029427228, 0.034439255, 0.02451897, 0.022556465, 0.025845034, 0.023618301, 0.027244467, 0.03743036, 0.023815455, 0.031364813, 0.03120123, 0.03664854, 0.022499243, 0.031462144, 0.02583402, 0.024097323, 0.023342345, 0.026554618, 0.0350747, 0.027666455, 0.022715826, 0.023211233, 0.032291323, 0.02275065, 0.028753616, 0.023882983, 0.021607231, 0.025322812, 0.024029626, 0.023037832, 0.02303716, 0.021975974, 0.024841227, 0.024588188]
2020-05-26 20:43:24,688 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:24,704 : INFO : topic #5 (0.021): 0.196*”offer” + 0.089*”tender” + 0.050*”cyclops” + 0.038*”dixons” + 0.023*”offers” + 0.019*”offered” + 0.018*”cyacq” + 0.018*”video” + 0.017*”outstanding” + 0.014*”tendered”
2020-05-26 20:43:24,706 : INFO : topic #42 (0.022): 0.123*”deficit” + 0.109*”surplus” + 0.068*”account” + 0.057*”current” + 0.043*”rubber” + 0.032*”february” + 0.028*”january” + 0.027*”balance” + 0.022*”exports” + 0.021*”payments”
2020-05-26 20:43:24,708 : INFO : topic #34 (0.035): 0.031*”prices” + 0.028*”world” + 0.015*”demand” + 0.013*”exports” + 0.012*”domestic” + 0.012*”economic” + 0.011*”countries” + 0.011*”economy” + 0.009*”government” + 0.009*”production”
2020-05-26 20:43:24,710 : INFO : topic #27 (0.037): 0.030*”government” + 0.010*”economic” + 0.010*”added” + 0.010*”growth” + 0.010*”years” + 0.008*”budget” + 0.008*”financial” + 0.007*”chairman” + 0.006*”reduce” + 0.006*”spending”
2020-05-26 20:43:24,712 : INFO : topic #23 (0.037): 0.185*”dividend” + 0.104*”prior” + 0.061*”quarterly” + 0.050*”payout” + 0.046*”payable” + 0.042*”preferred” + 0.028*”regular” + 0.028*”holders” + 0.027*”declared” + 0.027*”common”
2020-05-26 20:43:24,714 : INFO : topic diff=0.185751, rho=0.303107
2020-05-26 20:43:24,734 : INFO : PROGRESS: pass 6, at document #4000/7769
2020-05-26 20:43:25,855 : INFO : optimized alpha [0.030228268, 0.022694573, 0.025542162, 0.032446414, 0.027533667, 0.020969385, 0.03520936, 0.023066385, 0.024763387, 0.025543869, 0.030707775, 0.025436196, 0.022117933, 0.03388686, 0.031915072, 0.026870579, 0.029949797, 0.034883674, 0.02479685, 0.022757208, 0.026148938, 0.02382979, 0.027587308, 0.03801782, 0.024038007, 0.032032695, 0.031580824, 0.037443873, 0.022766082, 0.032051142, 0.026164154, 0.024413297, 0.023578338, 0.026785638, 0.0359852, 0.027927143, 0.02285047, 0.023629803, 0.03285431, 0.022990739, 0.029215287, 0.024064705, 0.0217847, 0.025487375, 0.024403103, 0.023301473, 0.023359831, 0.02218578, 0.025221322, 0.024758153]
2020-05-26 20:43:25,868 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:25,881 : INFO : topic #5 (0.021): 0.203*”offer” + 0.092*”tender” + 0.039*”cyclops” + 0.028*”dixons” + 0.027*”offers” + 0.023*”offered” + 0.021*”video” + 0.018*”outstanding” + 0.014*”cyacq” + 0.014*”tendered”
2020-05-26 20:43:25,883 : INFO : topic #42 (0.022): 0.119*”deficit” + 0.110*”surplus” + 0.065*”account” + 0.052*”current” + 0.043*”rubber” + 0.038*”hughes” + 0.028*”february” + 0.027*”january” + 0.026*”balance” + 0.023*”supermarkets”
2020-05-26 20:43:25,887 : INFO : topic #34 (0.036): 0.032*”prices” + 0.027*”world” + 0.015*”demand” + 0.013*”exports” + 0.012*”domestic” + 0.012*”economic” + 0.011*”countries” + 0.010*”economy” + 0.010*”production” + 0.010*”government”
2020-05-26 20:43:25,889 : INFO : topic #27 (0.037): 0.032*”government” + 0.011*”economic” + 0.011*”growth” + 0.010*”added” + 0.010*”years” + 0.009*”budget” + 0.008*”financial” + 0.007*”chairman” + 0.007*”reduce” + 0.006*”spending”
2020-05-26 20:43:25,892 : INFO : topic #23 (0.038): 0.183*”dividend” + 0.100*”prior” + 0.065*”quarterly” + 0.048*”payable” + 0.044*”payout” + 0.039*”preferred” + 0.032*”holders” + 0.029*”distribution” + 0.027*”declared” + 0.026*”regular”
2020-05-26 20:43:25,894 : INFO : topic diff=0.173823, rho=0.303107
2020-05-26 20:43:25,907 : INFO : PROGRESS: pass 6, at document #6000/7769
2020-05-26 20:43:26,950 : INFO : optimized alpha [0.03081976, 0.02279754, 0.025900135, 0.033013716, 0.027928371, 0.02119888, 0.036007486, 0.023226809, 0.025059152, 0.02578964, 0.031039659, 0.025607292, 0.022439284, 0.03470243, 0.032552026, 0.027258802, 0.030662518, 0.03536877, 0.025028104, 0.02296717, 0.026434867, 0.023994014, 0.027921319, 0.038869068, 0.024295019, 0.032857757, 0.032163683, 0.03831829, 0.023025902, 0.03234469, 0.026566729, 0.024582399, 0.02378816, 0.02721548, 0.036638062, 0.028211534, 0.022891896, 0.023875061, 0.03333114, 0.023181152, 0.029650394, 0.024327826, 0.02190811, 0.025919177, 0.024593757, 0.023517944, 0.023550272, 0.02224808, 0.025539234, 0.025070298]
2020-05-26 20:43:26,960 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:26,972 : INFO : topic #5 (0.021): 0.200*”offer” + 0.090*”tender” + 0.040*”cyclops” + 0.029*”dixons” + 0.027*”offers” + 0.022*”offered” + 0.019*”outstanding” + 0.018*”video” + 0.016*”cyacq” + 0.015*”tendered”
2020-05-26 20:43:26,973 : INFO : topic #42 (0.022): 0.129*”deficit” + 0.104*”surplus” + 0.064*”account” + 0.055*”current” + 0.044*”rubber” + 0.030*”hughes” + 0.028*”february” + 0.028*”balance” + 0.028*”allegheny” + 0.025*”january”
2020-05-26 20:43:26,974 : INFO : topic #34 (0.037): 0.033*”prices” + 0.026*”world” + 0.016*”demand” + 0.013*”exports” + 0.013*”domestic” + 0.012*”economic” + 0.011*”production” + 0.010*”countries” + 0.010*”economy” + 0.010*”government”
2020-05-26 20:43:26,976 : INFO : topic #27 (0.038): 0.034*”government” + 0.011*”added” + 0.011*”budget” + 0.011*”growth” + 0.010*”economic” + 0.010*”years” + 0.008*”financial” + 0.007*”spending” + 0.007*”chairman” + 0.007*”foreign”
2020-05-26 20:43:26,978 : INFO : topic #23 (0.039): 0.184*”dividend” + 0.110*”prior” + 0.063*”quarterly” + 0.051*”payout” + 0.049*”payable” + 0.039*”preferred” + 0.032*”holders” + 0.028*”declared” + 0.027*”regular” + 0.025*”distribution”
2020-05-26 20:43:26,980 : INFO : topic diff=0.160598, rho=0.303107
2020-05-26 20:43:28,262 : INFO : -7.673 per-word bound, 204.1 perplexity estimate based on a held-out corpus of 1769 documents with 75293 words
2020-05-26 20:43:28,263 : INFO : PROGRESS: pass 6, at document #7769/7769
2020-05-26 20:43:29,188 : INFO : optimized alpha [0.031179933, 0.022834724, 0.026178628, 0.033884708, 0.028327046, 0.021455547, 0.0366194, 0.02345359, 0.025384624, 0.026272165, 0.03164798, 0.025786767, 0.022657353, 0.03514437, 0.033173352, 0.027702449, 0.031148417, 0.03604488, 0.025160674, 0.02312934, 0.026790524, 0.024271911, 0.028472811, 0.03971365, 0.024628038, 0.033647608, 0.03280351, 0.039557252, 0.023287274, 0.03285517, 0.027030315, 0.024813494, 0.02404716, 0.027556792, 0.03759285, 0.028623145, 0.02318841, 0.02414112, 0.033942256, 0.023267984, 0.030283885, 0.024680201, 0.022026276, 0.026099836, 0.02460459, 0.023810206, 0.02359121, 0.022417974, 0.025829442, 0.025488352]
2020-05-26 20:43:29,198 : INFO : merging changes from 1769 documents into a model of 7769 documents
2020-05-26 20:43:29,209 : INFO : topic #5 (0.021): 0.188*”offer” + 0.091*”tender” + 0.057*”cyclops” + 0.042*”dixons” + 0.022*”cyacq” + 0.021*”offers” + 0.019*”offered” + 0.018*”outstanding” + 0.017*”video” + 0.016*”citicorp”
2020-05-26 20:43:29,211 : INFO : topic #42 (0.022): 0.126*”deficit” + 0.112*”surplus” + 0.066*”account” + 0.058*”current” + 0.047*”rubber” + 0.031*”february” + 0.028*”balance” + 0.027*”exports” + 0.026*”january” + 0.022*”allegheny”
2020-05-26 20:43:29,212 : INFO : topic #34 (0.038): 0.031*”prices” + 0.026*”world” + 0.015*”demand” + 0.013*”domestic” + 0.012*”economic” + 0.012*”exports” + 0.011*”economy” + 0.011*”countries” + 0.010*”production” + 0.010*”government”
2020-05-26 20:43:29,216 : INFO : topic #27 (0.040): 0.033*”government” + 0.011*”growth” + 0.010*”economic” + 0.010*”budget” + 0.010*”years” + 0.010*”added” + 0.008*”financial” + 0.007*”chairman” + 0.007*”spending” + 0.006*”reduce”
2020-05-26 20:43:29,218 : INFO : topic #23 (0.040): 0.181*”dividend” + 0.112*”prior” + 0.065*”quarterly” + 0.052*”payout” + 0.049*”payable” + 0.044*”preferred” + 0.029*”holders” + 0.028*”regular” + 0.027*”declared” + 0.027*”distribution”
2020-05-26 20:43:29,220 : INFO : topic diff=0.153112, rho=0.303107
2020-05-26 20:43:29,239 : INFO : PROGRESS: pass 7, at document #2000/7769
2020-05-26 20:43:30,240 : INFO : optimized alpha [0.03168711, 0.023043185, 0.026229145, 0.034166485, 0.028814131, 0.021570755, 0.03717867, 0.02350478, 0.025614157, 0.026382009, 0.032001723, 0.02598872, 0.022702893, 0.0354531, 0.033577524, 0.027779903, 0.03159678, 0.03660076, 0.025127001, 0.023211038, 0.02721306, 0.02449442, 0.028498108, 0.039923802, 0.024836877, 0.034402803, 0.033263106, 0.039761707, 0.023343476, 0.03311309, 0.027132006, 0.024912957, 0.02419824, 0.02780173, 0.037957832, 0.029015493, 0.023237232, 0.024302747, 0.034144938, 0.023370681, 0.030677136, 0.024801994, 0.022098577, 0.026193265, 0.024774743, 0.023978934, 0.023693912, 0.022551812, 0.02596487, 0.025667576]
2020-05-26 20:43:30,251 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:30,264 : INFO : topic #5 (0.022): 0.197*”offer” + 0.090*”tender” + 0.049*”cyclops” + 0.037*”dixons” + 0.023*”offers” + 0.019*”offered” + 0.018*”cyacq” + 0.017*”video” + 0.017*”outstanding” + 0.014*”tendered”
2020-05-26 20:43:30,265 : INFO : topic #42 (0.022): 0.123*”deficit” + 0.110*”surplus” + 0.069*”account” + 0.058*”current” + 0.042*”rubber” + 0.033*”february” + 0.028*”january” + 0.028*”balance” + 0.025*”exports” + 0.022*”payments”
2020-05-26 20:43:30,267 : INFO : topic #34 (0.038): 0.032*”prices” + 0.028*”world” + 0.015*”demand” + 0.013*”domestic” + 0.013*”exports” + 0.012*”economic” + 0.011*”economy” + 0.011*”countries” + 0.010*”production” + 0.009*”government”
2020-05-26 20:43:30,271 : INFO : topic #27 (0.040): 0.031*”government” + 0.011*”economic” + 0.010*”added” + 0.010*”growth” + 0.010*”years” + 0.010*”budget” + 0.008*”financial” + 0.007*”chairman” + 0.006*”spending” + 0.006*”companies”
2020-05-26 20:43:30,272 : INFO : topic #23 (0.040): 0.189*”dividend” + 0.106*”prior” + 0.062*”quarterly” + 0.050*”payout” + 0.048*”payable” + 0.043*”preferred” + 0.029*”declared” + 0.028*”holders” + 0.028*”regular” + 0.027*”distribution”
2020-05-26 20:43:30,273 : INFO : topic diff=0.135653, rho=0.290074
2020-05-26 20:43:30,291 : INFO : PROGRESS: pass 7, at document #4000/7769
2020-05-26 20:43:31,309 : INFO : optimized alpha [0.03218854, 0.023264246, 0.026529534, 0.03480323, 0.029293738, 0.021723673, 0.037686624, 0.023715872, 0.026017556, 0.026769074, 0.032459937, 0.026319502, 0.022921072, 0.035998452, 0.034044, 0.02817741, 0.032096334, 0.037020613, 0.02541941, 0.023407942, 0.027509207, 0.024710648, 0.02882849, 0.04045347, 0.025058646, 0.035089497, 0.033628978, 0.0405421, 0.023610696, 0.033671927, 0.027470924, 0.025214965, 0.024413561, 0.028016739, 0.03887194, 0.029284118, 0.023375152, 0.024720937, 0.03468183, 0.02361084, 0.031131292, 0.024983967, 0.022287343, 0.026353143, 0.025138283, 0.024214562, 0.023993665, 0.022735786, 0.026363315, 0.025839614]
2020-05-26 20:43:31,317 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:31,333 : INFO : topic #5 (0.022): 0.203*”offer” + 0.092*”tender” + 0.038*”cyclops” + 0.028*”dixons” + 0.027*”offers” + 0.023*”offered” + 0.020*”video” + 0.018*”outstanding” + 0.014*”cyacq” + 0.014*”tendered”
2020-05-26 20:43:31,334 : INFO : topic #42 (0.022): 0.119*”deficit” + 0.112*”surplus” + 0.066*”account” + 0.053*”current” + 0.043*”rubber” + 0.038*”hughes” + 0.029*”february” + 0.027*”balance” + 0.027*”january” + 0.024*”exports”
2020-05-26 20:43:31,335 : INFO : topic #34 (0.039): 0.033*”prices” + 0.028*”world” + 0.015*”demand” + 0.013*”exports” + 0.013*”domestic” + 0.012*”economic” + 0.011*”countries” + 0.010*”production” + 0.010*”economy” + 0.009*”government”
2020-05-26 20:43:31,336 : INFO : topic #23 (0.040): 0.186*”dividend” + 0.102*”prior” + 0.065*”quarterly” + 0.050*”payable” + 0.045*”payout” + 0.040*”preferred” + 0.032*”holders” + 0.030*”distribution” + 0.028*”declared” + 0.027*”regular”
2020-05-26 20:43:31,337 : INFO : topic #27 (0.041): 0.032*”government” + 0.011*”economic” + 0.011*”growth” + 0.011*”added” + 0.011*”budget” + 0.010*”years” + 0.008*”financial” + 0.007*”chairman” + 0.007*”reduce” + 0.006*”spending”
2020-05-26 20:43:31,339 : INFO : topic diff=0.128348, rho=0.290074
2020-05-26 20:43:31,355 : INFO : PROGRESS: pass 7, at document #6000/7769
2020-05-26 20:43:32,373 : INFO : optimized alpha [0.032795716, 0.023366932, 0.026895992, 0.035368964, 0.02967609, 0.021948975, 0.038480252, 0.023863496, 0.026304027, 0.02701283, 0.032786075, 0.026489567, 0.023251021, 0.03678982, 0.034654982, 0.028558312, 0.032829612, 0.037504874, 0.02564778, 0.023597358, 0.02777976, 0.024884164, 0.02915119, 0.041291386, 0.025311843, 0.035937913, 0.03420878, 0.041423842, 0.023857636, 0.0339356, 0.027859757, 0.025357364, 0.024600223, 0.028452702, 0.039506644, 0.029578336, 0.023425087, 0.02497663, 0.035132997, 0.023788678, 0.031582605, 0.025229245, 0.022417279, 0.026770104, 0.025332635, 0.024429055, 0.02419945, 0.022795148, 0.026673546, 0.026188709]
2020-05-26 20:43:32,382 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:32,394 : INFO : topic #5 (0.022): 0.200*”offer” + 0.091*”tender” + 0.040*”cyclops” + 0.029*”dixons” + 0.027*”offers” + 0.022*”offered” + 0.019*”outstanding” + 0.018*”video” + 0.016*”cyacq” + 0.015*”tendered”
2020-05-26 20:43:32,395 : INFO : topic #42 (0.022): 0.128*”deficit” + 0.106*”surplus” + 0.064*”account” + 0.056*”current” + 0.044*”rubber” + 0.030*”hughes” + 0.029*”february” + 0.028*”balance” + 0.027*”allegheny” + 0.025*”january”
2020-05-26 20:43:32,396 : INFO : topic #34 (0.040): 0.033*”prices” + 0.026*”world” + 0.016*”demand” + 0.013*”domestic” + 0.013*”exports” + 0.012*”economic” + 0.011*”production” + 0.010*”economy” + 0.010*”countries” + 0.010*”industry”
2020-05-26 20:43:32,398 : INFO : topic #23 (0.041): 0.187*”dividend” + 0.111*”prior” + 0.063*”quarterly” + 0.051*”payout” + 0.050*”payable” + 0.040*”preferred” + 0.032*”holders” + 0.029*”declared” + 0.027*”regular” + 0.025*”distribution”
2020-05-26 20:43:32,404 : INFO : topic #27 (0.041): 0.034*”government” + 0.012*”budget” + 0.011*”added” + 0.011*”growth” + 0.010*”economic” + 0.010*”years” + 0.008*”financial” + 0.007*”spending” + 0.007*”chairman” + 0.006*”reduce”
2020-05-26 20:43:32,406 : INFO : topic diff=0.121193, rho=0.290074
2020-05-26 20:43:33,633 : INFO : -7.660 per-word bound, 202.3 perplexity estimate based on a held-out corpus of 1769 documents with 75293 words
2020-05-26 20:43:33,634 : INFO : PROGRESS: pass 7, at document #7769/7769
2020-05-26 20:43:34,532 : INFO : optimized alpha [0.033142176, 0.02340131, 0.027166596, 0.036248226, 0.030076936, 0.022197545, 0.039025698, 0.024086747, 0.026627844, 0.027476916, 0.033377018, 0.026672145, 0.023460293, 0.03721516, 0.035269126, 0.029004129, 0.033306263, 0.038155936, 0.02577084, 0.023757018, 0.028106283, 0.025165431, 0.029691413, 0.04209446, 0.025652897, 0.03680081, 0.03481465, 0.042654425, 0.024105277, 0.034431756, 0.028318156, 0.025566204, 0.024857305, 0.028782759, 0.040448748, 0.029992467, 0.023690432, 0.025257057, 0.035736606, 0.023882747, 0.03217246, 0.025582446, 0.022539094, 0.026946131, 0.025329202, 0.0247394, 0.024244195, 0.022959892, 0.026977677, 0.026601886]
2020-05-26 20:43:34,541 : INFO : merging changes from 1769 documents into a model of 7769 documents
2020-05-26 20:43:34,556 : INFO : topic #5 (0.022): 0.189*”offer” + 0.091*”tender” + 0.055*”cyclops” + 0.041*”dixons” + 0.021*”cyacq” + 0.021*”offers” + 0.020*”offered” + 0.018*”outstanding” + 0.017*”video” + 0.016*”citicorp”
2020-05-26 20:43:34,557 : INFO : topic #42 (0.023): 0.125*”deficit” + 0.114*”surplus” + 0.067*”account” + 0.058*”current” + 0.047*”rubber” + 0.032*”february” + 0.030*”exports” + 0.029*”balance” + 0.025*”january” + 0.023*”imports”
2020-05-26 20:43:34,558 : INFO : topic #34 (0.040): 0.032*”prices” + 0.027*”world” + 0.015*”demand” + 0.013*”domestic” + 0.012*”economic” + 0.012*”exports” + 0.011*”economy” + 0.011*”countries” + 0.010*”production” + 0.010*”government”
2020-05-26 20:43:34,561 : INFO : topic #23 (0.042): 0.184*”dividend” + 0.112*”prior” + 0.066*”quarterly” + 0.052*”payout” + 0.050*”payable” + 0.044*”preferred” + 0.029*”declared” + 0.029*”holders” + 0.028*”regular” + 0.027*”distribution”
2020-05-26 20:43:34,562 : INFO : topic #27 (0.043): 0.034*”government” + 0.011*”budget” + 0.011*”growth” + 0.011*”economic” + 0.011*”years” + 0.010*”added” + 0.008*”financial” + 0.007*”chairman” + 0.007*”spending” + 0.006*”could”
2020-05-26 20:43:34,564 : INFO : topic diff=0.117301, rho=0.290074
2020-05-26 20:43:34,585 : INFO : PROGRESS: pass 8, at document #2000/7769
2020-05-26 20:43:35,559 : INFO : optimized alpha [0.033642676, 0.023611488, 0.027214488, 0.036516894, 0.030570222, 0.022306358, 0.03957356, 0.024140457, 0.026839733, 0.027579503, 0.033716533, 0.026872426, 0.023535186, 0.03751942, 0.035655774, 0.029068213, 0.03373389, 0.038702045, 0.025735086, 0.023813883, 0.02853426, 0.025382617, 0.02973389, 0.042270925, 0.025857985, 0.037561666, 0.0352388, 0.04281249, 0.024153845, 0.034659717, 0.028411875, 0.025651641, 0.025003342, 0.029032856, 0.04081826, 0.03039445, 0.023740841, 0.025413673, 0.03592071, 0.023984443, 0.032539826, 0.02570907, 0.022608763, 0.02703192, 0.025488745, 0.024902213, 0.024334576, 0.023076642, 0.027126627, 0.026779301]
2020-05-26 20:43:35,570 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:35,586 : INFO : topic #5 (0.022): 0.197*”offer” + 0.090*”tender” + 0.048*”cyclops” + 0.036*”dixons” + 0.023*”offers” + 0.019*”offered” + 0.018*”cyacq” + 0.017*”video” + 0.017*”outstanding” + 0.014*”tendered”
2020-05-26 20:43:35,587 : INFO : topic #42 (0.023): 0.122*”deficit” + 0.111*”surplus” + 0.069*”account” + 0.059*”current” + 0.042*”rubber” + 0.034*”february” + 0.028*”balance” + 0.028*”january” + 0.027*”exports” + 0.022*”imports”
2020-05-26 20:43:35,588 : INFO : topic #34 (0.041): 0.032*”prices” + 0.028*”world” + 0.015*”demand” + 0.013*”domestic” + 0.012*”exports” + 0.012*”economic” + 0.011*”economy” + 0.011*”countries” + 0.010*”production” + 0.009*”industry”
2020-05-26 20:43:35,589 : INFO : topic #23 (0.042): 0.190*”dividend” + 0.107*”prior” + 0.062*”quarterly” + 0.050*”payout” + 0.048*”payable” + 0.044*”preferred” + 0.030*”declared” + 0.028*”holders” + 0.028*”regular” + 0.027*”distribution”
2020-05-26 20:43:35,591 : INFO : topic #27 (0.043): 0.032*”government” + 0.011*”economic” + 0.011*”added” + 0.011*”growth” + 0.011*”budget” + 0.010*”years” + 0.008*”financial” + 0.007*”chairman” + 0.007*”spending” + 0.006*”could”
2020-05-26 20:43:35,593 : INFO : topic diff=0.105791, rho=0.278590
2020-05-26 20:43:35,613 : INFO : PROGRESS: pass 8, at document #4000/7769
2020-05-26 20:43:36,599 : INFO : optimized alpha [0.034142002, 0.023830436, 0.02750763, 0.037183966, 0.031047154, 0.022458851, 0.040051118, 0.024345051, 0.027231006, 0.027953595, 0.034147874, 0.027193727, 0.023785045, 0.038065977, 0.036120225, 0.029467016, 0.034250323, 0.03910797, 0.026048979, 0.024014479, 0.028831828, 0.025596501, 0.030039744, 0.042744346, 0.026090054, 0.03823803, 0.03559728, 0.043575667, 0.024412464, 0.035200834, 0.02874874, 0.025944851, 0.025230594, 0.029240835, 0.041757155, 0.030641442, 0.023875365, 0.025851764, 0.03641694, 0.024218032, 0.032982916, 0.025888339, 0.022792822, 0.027202927, 0.02585018, 0.025135076, 0.024605943, 0.023268525, 0.02752047, 0.02696488]
2020-05-26 20:43:36,612 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:36,626 : INFO : topic #5 (0.022): 0.203*”offer” + 0.093*”tender” + 0.038*”cyclops” + 0.027*”dixons” + 0.026*”offers” + 0.023*”offered” + 0.020*”video” + 0.018*”outstanding” + 0.014*”cyacq” + 0.014*”already”
2020-05-26 20:43:36,628 : INFO : topic #42 (0.023): 0.119*”deficit” + 0.113*”surplus” + 0.066*”account” + 0.053*”current” + 0.043*”rubber” + 0.037*”hughes” + 0.030*”february” + 0.027*”balance” + 0.026*”january” + 0.026*”exports”
2020-05-26 20:43:36,629 : INFO : topic #34 (0.042): 0.033*”prices” + 0.027*”world” + 0.015*”demand” + 0.013*”exports” + 0.013*”domestic” + 0.012*”economic” + 0.010*”production” + 0.010*”countries” + 0.010*”economy” + 0.010*”forecast”
2020-05-26 20:43:36,631 : INFO : topic #23 (0.043): 0.188*”dividend” + 0.103*”prior” + 0.066*”quarterly” + 0.050*”payable” + 0.045*”payout” + 0.041*”preferred” + 0.032*”holders” + 0.030*”distribution” + 0.029*”declared” + 0.027*”regular”
2020-05-26 20:43:36,633 : INFO : topic #27 (0.044): 0.033*”government” + 0.012*”economic” + 0.011*”growth” + 0.011*”budget” + 0.011*”added” + 0.010*”years” + 0.008*”financial” + 0.007*”chairman” + 0.007*”reduce” + 0.007*”spending”
2020-05-26 20:43:36,637 : INFO : topic diff=0.101142, rho=0.278590
2020-05-26 20:43:36,653 : INFO : PROGRESS: pass 8, at document #6000/7769
2020-05-26 20:43:37,625 : INFO : optimized alpha [0.03472773, 0.02392953, 0.027870962, 0.03775323, 0.031399474, 0.02269291, 0.040818762, 0.0245042, 0.027493387, 0.028184105, 0.034460895, 0.027370663, 0.02413199, 0.038864803, 0.036719706, 0.029845752, 0.0349515, 0.03958375, 0.026263867, 0.02420108, 0.029104127, 0.0257538, 0.030343108, 0.04354078, 0.026339762, 0.03910358, 0.036179543, 0.044456765, 0.024637457, 0.03544134, 0.029136071, 0.02608062, 0.025401693, 0.029651137, 0.042409066, 0.030915642, 0.023920093, 0.026107006, 0.03684008, 0.024386957, 0.033424728, 0.026134541, 0.02292569, 0.027617952, 0.026022991, 0.025347427, 0.024788838, 0.023325546, 0.027842095, 0.027309492]
2020-05-26 20:43:37,635 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:37,647 : INFO : topic #5 (0.023): 0.200*”offer” + 0.091*”tender” + 0.039*”cyclops” + 0.028*”dixons” + 0.026*”offers” + 0.022*”offered” + 0.019*”outstanding” + 0.018*”video” + 0.016*”cyacq” + 0.015*”tendered”
2020-05-26 20:43:37,648 : INFO : topic #42 (0.023): 0.128*”deficit” + 0.107*”surplus” + 0.065*”account” + 0.056*”current” + 0.044*”rubber” + 0.029*”hughes” + 0.029*”february” + 0.029*”balance” + 0.027*”exports” + 0.027*”allegheny”
2020-05-26 20:43:37,650 : INFO : topic #34 (0.042): 0.034*”prices” + 0.026*”world” + 0.016*”demand” + 0.013*”domestic” + 0.013*”exports” + 0.012*”economic” + 0.011*”production” + 0.010*”economy” + 0.010*”countries” + 0.010*”industry”
2020-05-26 20:43:37,653 : INFO : topic #23 (0.044): 0.188*”dividend” + 0.112*”prior” + 0.064*”quarterly” + 0.051*”payout” + 0.051*”payable” + 0.041*”preferred” + 0.032*”holders” + 0.030*”declared” + 0.027*”regular” + 0.026*”distribution”
2020-05-26 20:43:37,655 : INFO : topic #27 (0.044): 0.034*”government” + 0.013*”budget” + 0.012*”added” + 0.011*”growth” + 0.011*”economic” + 0.010*”years” + 0.008*”financial” + 0.008*”spending” + 0.007*”chairman” + 0.006*”could”
2020-05-26 20:43:37,656 : INFO : topic diff=0.096868, rho=0.278590
2020-05-26 20:43:38,967 : INFO : -7.650 per-word bound, 200.9 perplexity estimate based on a held-out corpus of 1769 documents with 75293 words
2020-05-26 20:43:38,968 : INFO : PROGRESS: pass 8, at document #7769/7769
2020-05-26 20:43:39,855 : INFO : optimized alpha [0.035056114, 0.02395732, 0.028134348, 0.03862257, 0.031807475, 0.02293765, 0.041329935, 0.024724318, 0.027799321, 0.02865412, 0.035062857, 0.027551733, 0.024353717, 0.039256353, 0.037287705, 0.030276336, 0.035402935, 0.04020412, 0.026407296, 0.0243587, 0.029405862, 0.026051516, 0.030868214, 0.04432603, 0.026679158, 0.039971836, 0.03673957, 0.04569675, 0.024880387, 0.035914134, 0.02959021, 0.026281632, 0.02566547, 0.029985193, 0.04331889, 0.031308953, 0.024175484, 0.026396215, 0.037413877, 0.024495892, 0.03400744, 0.026472488, 0.023040153, 0.027811697, 0.02600272, 0.025643745, 0.024825979, 0.023496538, 0.028168539, 0.027714014]
2020-05-26 20:43:39,864 : INFO : merging changes from 1769 documents into a model of 7769 documents
2020-05-26 20:43:39,877 : INFO : topic #5 (0.023): 0.190*”offer” + 0.092*”tender” + 0.054*”cyclops” + 0.040*”dixons” + 0.021*”offers” + 0.021*”cyacq” + 0.020*”offered” + 0.018*”outstanding” + 0.017*”video” + 0.016*”citicorp”
2020-05-26 20:43:39,878 : INFO : topic #42 (0.023): 0.125*”deficit” + 0.114*”surplus” + 0.067*”account” + 0.059*”current” + 0.046*”rubber” + 0.033*”exports” + 0.032*”february” + 0.029*”balance” + 0.026*”imports” + 0.025*”january”
2020-05-26 20:43:39,880 : INFO : topic #34 (0.043): 0.032*”prices” + 0.027*”world” + 0.015*”demand” + 0.013*”domestic” + 0.012*”economic” + 0.012*”exports” + 0.012*”economy” + 0.011*”production” + 0.011*”countries” + 0.009*”industry”
2020-05-26 20:43:39,882 : INFO : topic #23 (0.044): 0.185*”dividend” + 0.113*”prior” + 0.066*”quarterly” + 0.052*”payout” + 0.051*”payable” + 0.045*”preferred” + 0.030*”declared” + 0.029*”holders” + 0.028*”regular” + 0.027*”distribution”
2020-05-26 20:43:39,887 : INFO : topic #27 (0.046): 0.034*”government” + 0.012*”budget” + 0.011*”economic” + 0.011*”growth” + 0.011*”added” + 0.011*”years” + 0.008*”financial” + 0.007*”chairman” + 0.007*”spending” + 0.006*”could”
2020-05-26 20:43:39,888 : INFO : topic diff=0.094685, rho=0.278590
2020-05-26 20:43:39,906 : INFO : PROGRESS: pass 9, at document #2000/7769
2020-05-26 20:43:40,871 : INFO : optimized alpha [0.03553181, 0.024162658, 0.028179431, 0.038872153, 0.032286063, 0.023046307, 0.041857094, 0.024776813, 0.027998196, 0.02875073, 0.035402965, 0.027739415, 0.024429929, 0.039552994, 0.037667904, 0.030343609, 0.035805028, 0.040730394, 0.026365098, 0.02442116, 0.029817611, 0.026270818, 0.03088316, 0.044443786, 0.026894929, 0.04072897, 0.03713903, 0.045844514, 0.024932053, 0.036118433, 0.029686723, 0.02636097, 0.02581027, 0.030204322, 0.043653645, 0.031708874, 0.024220798, 0.026565501, 0.037578285, 0.024606226, 0.034358084, 0.026596697, 0.023107281, 0.027892657, 0.026163526, 0.02581513, 0.024936017, 0.023615856, 0.02830019, 0.027889844]
2020-05-26 20:43:40,883 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:40,895 : INFO : topic #5 (0.023): 0.197*”offer” + 0.091*”tender” + 0.047*”cyclops” + 0.036*”dixons” + 0.023*”offers” + 0.019*”offered” + 0.018*”cyacq” + 0.017*”video” + 0.017*”outstanding” + 0.014*”tendered”
2020-05-26 20:43:40,897 : INFO : topic #42 (0.023): 0.121*”deficit” + 0.112*”surplus” + 0.069*”account” + 0.059*”current” + 0.042*”rubber” + 0.034*”february” + 0.030*”exports” + 0.028*”balance” + 0.027*”january” + 0.024*”imports”
2020-05-26 20:43:40,898 : INFO : topic #34 (0.044): 0.033*”prices” + 0.027*”world” + 0.015*”demand” + 0.013*”domestic” + 0.012*”exports” + 0.012*”economic” + 0.011*”economy” + 0.010*”countries” + 0.010*”production” + 0.009*”industry”
2020-05-26 20:43:40,901 : INFO : topic #23 (0.044): 0.191*”dividend” + 0.107*”prior” + 0.062*”quarterly” + 0.050*”payout” + 0.049*”payable” + 0.044*”preferred” + 0.030*”declared” + 0.028*”regular” + 0.028*”holders” + 0.027*”distribution”
2020-05-26 20:43:40,903 : INFO : topic #27 (0.046): 0.032*”government” + 0.011*”economic” + 0.011*”budget” + 0.011*”added” + 0.011*”growth” + 0.010*”years” + 0.008*”financial” + 0.007*”chairman” + 0.007*”spending” + 0.006*”could”
2020-05-26 20:43:40,905 : INFO : topic diff=0.086706, rho=0.268371
2020-05-26 20:43:40,922 : INFO : PROGRESS: pass 9, at document #4000/7769
2020-05-26 20:43:41,995 : INFO : optimized alpha [0.036015492, 0.024376381, 0.028481157, 0.039500076, 0.032747433, 0.02319207, 0.042305525, 0.02498479, 0.02836983, 0.029112421, 0.03580184, 0.028066456, 0.02468809, 0.040047657, 0.03810637, 0.030739967, 0.03629763, 0.04110277, 0.026675666, 0.024615748, 0.030108806, 0.026486455, 0.031191168, 0.044900104, 0.0271155, 0.041384142, 0.037456237, 0.046555825, 0.025182396, 0.03665836, 0.030013354, 0.026635073, 0.026046263, 0.030422771, 0.044575024, 0.031943075, 0.024355521, 0.027010348, 0.03806694, 0.024827233, 0.034785535, 0.02677363, 0.023289897, 0.028074358, 0.026516978, 0.026027935, 0.0252039, 0.023809133, 0.028697489, 0.028100058]
2020-05-26 20:43:42,004 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:42,018 : INFO : topic #5 (0.023): 0.203*”offer” + 0.093*”tender” + 0.038*”cyclops” + 0.027*”dixons” + 0.026*”offers” + 0.023*”offered” + 0.020*”video” + 0.018*”outstanding” + 0.014*”cyacq” + 0.014*”already”
2020-05-26 20:43:42,019 : INFO : topic #42 (0.023): 0.119*”deficit” + 0.113*”surplus” + 0.066*”account” + 0.054*”current” + 0.043*”rubber” + 0.036*”hughes” + 0.030*”february” + 0.028*”exports” + 0.027*”balance” + 0.026*”january”
2020-05-26 20:43:42,021 : INFO : topic #34 (0.045): 0.033*”prices” + 0.027*”world” + 0.015*”demand” + 0.013*”domestic” + 0.013*”exports” + 0.012*”economic” + 0.011*”production” + 0.010*”economy” + 0.010*”countries” + 0.010*”forecast”
2020-05-26 20:43:42,022 : INFO : topic #23 (0.045): 0.189*”dividend” + 0.103*”prior” + 0.066*”quarterly” + 0.051*”payable” + 0.046*”payout” + 0.041*”preferred” + 0.031*”holders” + 0.030*”distribution” + 0.030*”declared” + 0.027*”regular”
2020-05-26 20:43:42,023 : INFO : topic #27 (0.047): 0.033*”government” + 0.012*”economic” + 0.012*”budget” + 0.011*”growth” + 0.011*”added” + 0.010*”years” + 0.008*”financial” + 0.007*”chairman” + 0.007*”spending” + 0.006*”reduce”
2020-05-26 20:43:42,025 : INFO : topic diff=0.083641, rho=0.268371
2020-05-26 20:43:42,042 : INFO : PROGRESS: pass 9, at document #6000/7769
2020-05-26 20:43:43,026 : INFO : optimized alpha [0.03658393, 0.024472238, 0.028834064, 0.040018305, 0.03309981, 0.02342895, 0.04305548, 0.025145093, 0.028608762, 0.029345961, 0.03610211, 0.028239246, 0.02503852, 0.040839836, 0.038673908, 0.031104205, 0.036992192, 0.041565843, 0.026880922, 0.024780152, 0.030367004, 0.02662804, 0.031476557, 0.045668505, 0.027365578, 0.04222473, 0.03801303, 0.04740887, 0.025406687, 0.036872394, 0.030393705, 0.0267686, 0.026212666, 0.030824576, 0.0452191, 0.032197684, 0.024398422, 0.02726117, 0.03848382, 0.025004433, 0.03521765, 0.02701022, 0.023416031, 0.028495377, 0.026687289, 0.026227977, 0.025395596, 0.023872323, 0.029023038, 0.028459415]
2020-05-26 20:43:43,036 : INFO : merging changes from 2000 documents into a model of 7769 documents
2020-05-26 20:43:43,047 : INFO : topic #5 (0.023): 0.200*”offer” + 0.091*”tender” + 0.039*”cyclops” + 0.028*”dixons” + 0.026*”offers” + 0.022*”offered” + 0.019*”outstanding” + 0.018*”video” + 0.016*”cyacq” + 0.015*”tendered”
2020-05-26 20:43:43,049 : INFO : topic #42 (0.023): 0.127*”deficit” + 0.108*”surplus” + 0.065*”account” + 0.056*”current” + 0.043*”rubber” + 0.030*”february” + 0.029*”hughes” + 0.029*”exports” + 0.029*”balance” + 0.027*”allegheny”
2020-05-26 20:43:43,050 : INFO : topic #34 (0.045): 0.034*”prices” + 0.026*”world” + 0.016*”demand” + 0.013*”domestic” + 0.012*”exports” + 0.012*”economic” + 0.012*”production” + 0.010*”economy” + 0.010*”industry” + 0.010*”forecast”
2020-05-26 20:43:43,056 : INFO : topic #23 (0.046): 0.188*”dividend” + 0.112*”prior” + 0.064*”quarterly” + 0.052*”payable” + 0.051*”payout” + 0.041*”preferred” + 0.031*”holders” + 0.031*”declared” + 0.027*”regular” + 0.026*”distribution”
2020-05-26 20:43:43,058 : INFO : topic #27 (0.047): 0.034*”government” + 0.013*”budget” + 0.012*”added” + 0.011*”growth” + 0.011*”economic” + 0.010*”years” + 0.008*”financial” + 0.008*”spending” + 0.007*”chairman” + 0.006*”could”
2020-05-26 20:43:43,060 : INFO : topic diff=0.080979, rho=0.268371
2020-05-26 20:43:44,287 : INFO : -7.642 per-word bound, 199.8 perplexity estimate based on a held-out corpus of 1769 documents with 75293 words
2020-05-26 20:43:44,288 : INFO : PROGRESS: pass 9, at document #7769/7769
2020-05-26 20:43:45,153 : INFO : optimized alpha [0.03689451, 0.02448952, 0.02909529, 0.040876746, 0.03347605, 0.023673572, 0.04352611, 0.025350707, 0.028907038, 0.029798938, 0.036666725, 0.028423619, 0.02526802, 0.041212622, 0.039236557, 0.031538635, 0.037418474, 0.042181525, 0.02700657, 0.024932474, 0.030677883, 0.026914785, 0.031972315, 0.04643778, 0.02770644, 0.04308389, 0.038551256, 0.048641156, 0.025637522, 0.0373062, 0.030848686, 0.02695833, 0.026479421, 0.031125084, 0.04613982, 0.032575827, 0.024644613, 0.027546428, 0.039041612, 0.025112325, 0.035782345, 0.027362866, 0.023523474, 0.028675279, 0.02668347, 0.026498966, 0.025440222, 0.02403454, 0.029366067, 0.028868373]
2020-05-26 20:43:45,162 : INFO : merging changes from 1769 documents into a model of 7769 documents
2020-05-26 20:43:45,176 : INFO : topic #42 (0.024): 0.124*”deficit” + 0.115*”surplus” + 0.067*”account” + 0.058*”current” + 0.046*”rubber” + 0.034*”exports” + 0.032*”february” + 0.029*”balance” + 0.027*”imports” + 0.025*”january”
2020-05-26 20:43:45,177 : INFO : topic #5 (0.024): 0.190*”offer” + 0.091*”tender” + 0.053*”cyclops” + 0.040*”dixons” + 0.021*”offers” + 0.020*”cyacq” + 0.020*”offered” + 0.018*”outstanding” + 0.017*”video” + 0.016*”citicorp”
2020-05-26 20:43:45,178 : INFO : topic #34 (0.046): 0.032*”prices” + 0.026*”world” + 0.015*”demand” + 0.013*”domestic” + 0.012*”economic” + 0.012*”exports” + 0.012*”economy” + 0.011*”production” + 0.010*”countries” + 0.010*”forecast”
2020-05-26 20:43:45,179 : INFO : topic #23 (0.046): 0.185*”dividend” + 0.113*”prior” + 0.066*”quarterly” + 0.052*”payout” + 0.052*”payable” + 0.045*”preferred” + 0.031*”declared” + 0.028*”holders” + 0.028*”regular” + 0.028*”distribution”
2020-05-26 20:43:45,181 : INFO : topic #27 (0.049): 0.034*”government” + 0.012*”budget” + 0.011*”economic” + 0.011*”added” + 0.011*”growth” + 0.011*”years” + 0.008*”financial” + 0.007*”chairman” + 0.007*”spending” + 0.007*”could”
2020-05-26 20:43:45,183 : INFO : topic diff=0.079725, rho=0.268371
0: quarter fourth third results reported losses second shell income earlier
1: sugar tonnes export wheat traders tonne shipment intervention tender sources
2: crude energy barrels petroleum prices production texas barrel refinery reserves
3: committee house meeting administration european proposal proposed president could industry
4: board split shareholders raises effective common directors meeting increase raised
5: offer tender cyclops dixons offers cyacq offered outstanding video citicorp
6: subsidiary owned division acquired services terms mining health wholly based
7: exploration drilling industries block offshore mexico northern telecom project mines
8: banks loans commercial banking interest deposits assets funds bankers credit
9: merger agreement management proposed shareholders merge waste companies approved transaction
10: stake securities investment exchange commission total common bought express shearson
11: tonnes exports imports production total maize import figures output shipments
12: minister official ecuador venezuela great ceiling employers labour months sector
13: january december february revised compared adjusted figures output production growth
14: ended months results fiscal pretax south costs period africa charges
15: western communications cable television partners brazil network conservation wells action
16: operations excludes seven discontinued eight gains respectively prior include technology
17: earnings expects revenues financial business operations report operating higher earned
18: offer gencorp wheat general soviet department purolator agriculture comment buyout
19: saudi arabia service power electric iranian kuwait marine broadcasting nuclear
20: analysts rates point analyst sterling interest prime markets likely could
21: rates german policy money bundesbank reserve growth economists interest monetary
22: stores australia general australian natural cubic statement boston newspaper store
23: dividend prior quarterly payout payable preferred declared holders regular distribution
24: north chief pipeline executive president officer johnson fields field santos
25: acquisition agreement acquire purchase systems agreed products terms completed business
26: includes extraordinary charge income cents credits restructuring current provision making
27: government budget economic added growth years financial chairman spending could
28: china savings federal association trust paper chinese restaurants jersey estate
29: pacific credit copper holding holdings application continental stake owned insurance
30: foreign firms yeutter trading round countries companies commerce goods country
31: canada canadian takeover airline morgan interstate airlines fleet traffic industry
32: units department transportation illinois partnership river family debentures freight limited
33: february january prices index consumer month orders retail department inflation
34: prices world demand domestic economic exports economy production countries forecast
35: dollar currency exchange francs central foreign japan french paris france
36: baker treasury nations customer federal paris korea agreements reserves exchange
37: spokesman commission european dutch community british crowns telephone belgian scheme
38: japan japanese united officials states agreement industry imports reagan congress
39: program soybean acres ounces production silver soybeans venture acreage ounce
40: american profits turnover standard national chemical parent security capital mortgage
41: resources properties allied interest claims bankruptcy payments houston reorganization siemens
42: deficit surplus account current rubber exports february balance imports january
43: coffee producers brazil buffer cocoa talks delegates meeting export agreement
44: grain wheat farmers report agriculture weather areas winter crops harvest
45: futures exchange trading contract contracts london traders options commodity option
46: union strike steel workers soviet plant ships spokesman taiwan tonnes
47: marks money england bills around dealers assistance shortage system morning
48: court shipping spokesman agency order cattle could missiles reports department
49: issue capital issued notes warrants equity value certificates issues pesos
Document classification with topic features¶
Since we will be using topic distribution of documents as features for our classifiers, we need to extract topic vector for each document using the LDA model we trained.
In [37]:
train_vectors = [model[doc] for doc in preprocessed_train]
test_vectors = [model[doc] for doc in preprocessed_test]
print(train_vectors[0])
[(1, 0.08835984), (11, 0.07526854), (12, 0.109812796), (18, 0.025732666), (22, 0.053181678), (33, 0.047094624), (35, 0.013233464), (37, 0.03464374), (43, 0.15859266), (44, 0.28209603), (47, 0.018432979), (48, 0.061606232), (49, 0.022008965)]
Now we convert the sparse vectors to dense matrices which can be used as input to the classifiers.
In [38]:
import numpy as np
def to_matrix(vectors, num_topics):
matrix = np.zeros((len(vectors), num_topics))
for i in range(len(vectors)):
for t in vectors[i]:
matrix[i][t[0]] = t[1]
return matrix
train_matrix = to_matrix(train_vectors, num_topics)
test_matrix = to_matrix(test_vectors, num_topics)
print(train_matrix.shape)
print(test_matrix.shape)
(7769, 50)
(3019, 50)
Define a wide range of basic classifiers. We will do a 10-fold cross validation on training set for model selection.
In [39]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
clfs = [KNeighborsClassifier(),DecisionTreeClassifier(),RandomForestClassifier()
,LinearSVC(),LogisticRegression()]
In [40]:
from sklearn import model_selection
#from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
def do_multiple_10foldcrossvalidation(clfs,data,classifications):
for clf in clfs:
predictions = model_selection.cross_val_predict(clf, data,classifications, cv=10)
print (clf)
print (“accuracy”)
print (accuracy_score(classifications,predictions))
print (classification_report(classifications,predictions))
do_multiple_10foldcrossvalidation(clfs, train_matrix, training_classifications)
KNeighborsClassifier(algorithm=’auto’, leaf_size=30, metric=’minkowski’,
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights=’uniform’)
accuracy
0.9030763289998713
precision recall f1-score support
acq 0.88 0.63 0.73 1650
not acq 0.91 0.98 0.94 6119
accuracy 0.90 7769
macro avg 0.89 0.80 0.84 7769
weighted avg 0.90 0.90 0.90 7769
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion=’gini’,
max_depth=None, max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=’deprecated’,
random_state=None, splitter=’best’)
accuracy
0.9062942463637533
precision recall f1-score support
acq 0.79 0.76 0.78 1650
not acq 0.94 0.95 0.94 6119
accuracy 0.91 7769
macro avg 0.86 0.85 0.86 7769
weighted avg 0.91 0.91 0.91 7769
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
criterion=’gini’, max_depth=None, max_features=’auto’,
max_leaf_nodes=None, max_samples=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)
accuracy
0.9409190371991247
precision recall f1-score support
acq 0.89 0.82 0.86 1650
not acq 0.95 0.97 0.96 6119
accuracy 0.94 7769
macro avg 0.92 0.90 0.91 7769
weighted avg 0.94 0.94 0.94 7769
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss=’squared_hinge’, max_iter=1000,
multi_class=’ovr’, penalty=’l2′, random_state=None, tol=0.0001,
verbose=0)
accuracy
0.9259879006307118
precision recall f1-score support
acq 0.87 0.77 0.82 1650
not acq 0.94 0.97 0.95 6119
accuracy 0.93 7769
macro avg 0.90 0.87 0.88 7769
weighted avg 0.92 0.93 0.92 7769
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class=’auto’, n_jobs=None, penalty=’l2′,
random_state=None, solver=’lbfgs’, tol=0.0001, verbose=0,
warm_start=False)
accuracy
0.9181361822628394
precision recall f1-score support
acq 0.88 0.71 0.79 1650
not acq 0.93 0.97 0.95 6119
accuracy 0.92 7769
macro avg 0.90 0.84 0.87 7769
weighted avg 0.92 0.92 0.91 7769
In [43]:
%matplotlib inline
import matplotlib.pyplot as plt
def test_and_graph(clfs,training_data,training_classifications,test_data,test_classifications):
accuracies = []
for clf in clfs:
clf.fit(training_data,training_classifications)
predictions = clf.predict(test_data)
accuracies.append(accuracy_score(test_classifications,predictions))
print (accuracies)
p = plt.bar([num + 0.25 for num in range(len(clfs))], accuracies,0.5)
plt.ylabel(‘Accuracy’)
plt.title(‘Accuracy classifying acq topic in Reuters, by classifier’)
plt.ylim([0.6,0.8])
plt.xticks([num + 0.25 for num in range(len(clfs))], (‘kNN’, ‘DT’, ‘RF’, ‘SVM’, ‘LR’))
plt.show()
test_and_graph(clfs,train_matrix,training_classifications,test_matrix,test_classifications)
[0.7105001656177542, 0.6565087777409738, 0.6826763829082477, 0.7174561112951309, 0.7191122888373633]
In [ ]: