程序代写代做代考 algorithm 4.-Latent-Variable-Models-and-EM.2.2

4.-Latent-Variable-Models-and-EM.2.2

5 Latent Variable Models for Document Analysis

16

5
Latent Variable Models for Document Analysis

Document Clustering
Given a collection of documents we would like to partition them into clusters.

Document Representation. Each document is made of some text. We treat a document as a set of
words in its text irrespective of their positions, i.e. a document consisting of the text “Andi went to school
to meet Bob” is represented as the following set of words: {Andi, went, to, school, to, meet, Bob}. Note
that if we had another document with the text “Bob went to school to meet Andi”, it would have the same
representation as the previous sentence. We refer to this as the bag of word representation of the
document. Also, we assume the words appearing in our collection of documents come from a dictionary
denoted by .

Generative Model. We assume the following hypothetical generative story for generating the collection
of our documents:

For each document
toss the -face dice (with the parameter ) to choose the face (i.e. the cluster) that the

th document belongs to

For each word placeholder in the document
generate the word by tossing the dice (with parameter ) corresponding to the face

.

The parameters of the model are (1) the clusters proportion which is a probability vector of size

where , and (2) the word proportion corresponding to the th face of the dice where

; note that we have such word proportion vectors each of which corresponding to a
face of the dice (or cluster).

The probability of generating a a pair of a document and its cluster (k,d), according to our generative
story, is

where is simply the number of occurrences of the word in the document .

5 Latent Variable Models for Document Analysis

17

In practice, the document cluster ids are not given to us, so we use latent variables to denote the cluster
assignments.

Complete Data
The Likelihood. Assume that in addition to the documents , we were also given the

values of the corresponding latent variables , then

where is the cluster assignment vector for the th document in which

if this document belongs to the cluster and zero otherwise. Note that only one element of the
cluster assignment vector is 1, and the rest are zero. To summarise, the log-likelihood of the complete
data is:

Learning the Parameters. Maximising the complete data log-likelihood to learn the parameters of the
model leads to the following solution:

The mixing components: where

The word proportion parameters for each cluster:

The above results can be obtained by forming the Lagrangian (to enforce the constraints, e.g. the sum of
mixing coefficients must be zero), and then setting the derivatives with respect to the parameters to zero
(see the Appendix A).

The above estimates for the optimal parameters are very intuitive. For example, the best value for is
obtained by counting the number of times that each word of the dictionary has been seen in the
documents belonging to the cluster , and then normalising this count vector so that it sums to 1 (be
dividing each element of the count vector by the sum of the counts).

Incomplete Data and EM
The Likelihood. In practice, the document clusters are not given to us, so is latent. So, the
probability of the observed documents is

5 Latent Variable Models for Document Analysis

18

To summarise, the log-likelihood is:

To maximise the above incomplete data log-likelihood objective, we resort to the EM Algorithm.

The Function. Let’s look at the function, which will be the basis of our EM Algorithm:

where is the collection of model parameters, and

are the responsibility factors.

To maximise the function, we can form the Lagrangian to enforce the constraints (See Appendix A),
and set the derivatives to zero which leads to the following solution for the model parameters:

The mixing components: where

The word proportion parameters for each cluster:

The EM Algorithm. Now let’s put everything together to construct our EM algorithm to learn the
parameters and find the best value for the latent variables:

5 Latent Variable Models for Document Analysis

19

Choose an initial setting for the parameters

While the convergence is not met:

E step: Set based on

M Step: Set based on

5 Latent Variable Models for Document Analysis