write_up
python hmm.py
I try the method of both constant probability using 1/1000 and the method of using words occurring 1 time to
estimate the probability of UNKNOWN_WORD. The first method achieves accuracy of about 74% and the
second method achieves accuracy above 94% which is much better. So I finally use the distribution of all
items occurring only once as the basis of computing likelihoods for OOV items.
I use the bigram hidden Markov model and the Viterbi dynamic algorithm to do the tagging.
computeCounts(fn) implements the training process computing the two hash tables ‘POS’ which is a
list of frequencies of words that occur with that POS and ‘STATE’ which is a list of frequencies of following
states.
tag(posFreq, stateFreq, states, stateCounts, wordSet, ws) Use Viterbi algorithm to
tags one sentence.
test(posFreq, stateFreq, stateCounts, wordSet, fn, outfn) Tag all the sentence in
input file and output the tagging results to file.
run(corpusfn, testfn, outfn) Main function that trains on training file and does the tagging on
testing file.
http://cs.nyu.edu/courses/spring17/CSCI-UA.0480-009/homework4.html
http://cs.nyu.edu/courses/spring17/CSCI-UA.0480-009/lecture4-hmm.pdf
Homework Number 4 Write Up
Run
OOV
Algorithm
System Outline
Reference