程序代写代做代考 data structure chain Hidden Markov Mode html algorithm gui School of Computing and Information Systems The University of Melbourne COMP90042

School of Computing and Information Systems The University of Melbourne COMP90042
NATURAL LANGUAGE PROCESSING (Semester 1, 2020)
Sample solutions for discussion exercises: Week 9
Discussion
1. What differentiates probabilistic CYK parsing from CYK parsing? Why is this
important? How does this affect the algorithms used for parsing?
• Parsing in general is the process of identifying the structure(s) of a sentence, according to a grammar of the language.
• In general, the search space is too large to do this efficiently, so we use a dynamic programming method to keep track of partial solutions. The data structure we use for this is a chart in CYK, where entries in the chart cor- respond to partial parses (licensed structures) for various spans (sequences of tokens) within the sentence. Probabilistic CYK parsing adds real-valued weights (probability) to each production in the grammar, such that parse trees can be assigned a score, namely the product of the probabilities of the pro- ductions in the tree. This is important as it allows for discrimination between likely and unlikely parses, rather than just provide a list of all parses, as in standard CYK parsing. This affects the algorithms as they need to track the maximum probability analysis for each span, rather than the set of grammar symbols. However the parsing algorithms are very closely related.
2. What is a probabilistic context-free grammar and what problem does it attempt to solve?
• A probabilistic context-free grammar adds real-valued weights (probability) to each production in the context-free grammar. This attempts to provide a “language model”, that is, describe the likely sentences in a language, which is facilitated by their grammatical structure.
3. A hidden Markov model assigns each word in a sentence with a tag, e.g., Donald/NNP has/VBZ small/JJ hands/NNS
The probability of the sequence is based on the tag-word pairs, and the pairs of adjacent tags. Show how this process can be framed as a CFG, and how the vari- ous probabilities (e.g., observation, transition, and initial state) can be assigned to productions. What are the similarities and differences between CYK parsing with this grammar, and the HMM’s Viterbi algorithm for finding the best scoring state sequence?
• The first step is to write the probability assigned by the HMM to the tagged sentence.
• This can be drawn as a chain with the tag-tag transitions as the “spine”, with each observation as a leaf. The probabilities of initial / transitions / obser- vations can be attached to each edge. Overall this gives a right-branching tree.
1

• If we treat this tree as a “parse tree”, then each clique from the tree (parent and direct children) can be treated as a CFG rule, parent → left-child right- child. E.g., NNP → Donald VBZ.
• The score for this rule would be ANNP,VBZ × ONNP,Donald. Note that we could use different grammar symbols, such that each production maps to a single HMM component. E.g., split the above rule to NNP’ → NNP VBZ (ANNP,VBZ); and NNP → Donald (ONNP,Donald); where the “prime” version of the tag is a newly introduced grammar symbol.
• CYK parsing under this grammar will do the same as Viterbi in the HMM, but will waste effort assigning analysis to all word spans in the sentence, while Viterbi effectively only considers spans that start at the first word of the sentence.
4. Using typical dependency types, construct (by hand) a dependency parse for the following sentence: Yesterday, I shot an elephant in my pyjamas. Check your work against the output of the online GUI for the Stanford Parser (https:// corenlp.run/).
• Dependency parses lend themselves to a flat representation, from which you can derive the tree if you wish:
ID Token Head Relation
1 Yesterday 4 TMP / OBL:TMOD
2 , 4 PUNCT
3 I 4 NSUBJ
4 shot 0 ROOT
5 an 6 DET
6 elephant 4 DOBJ
7 in 9 CASE
8 my 9 POSS
9 pyjamas 4 OBL
10 . 4 PUNCT
• OBL = oblique nominal. The OBL relation is used for a nominal (noun, pronoun, noun phrase) functioning as a non-core (oblique) argument or adjunct. This means that it functionally corresponds to an adverbial attaching to a verb, adjective or other adverb. (https://universaldependencies.org/u/ dep/obl.html)
5. In what ways is (transition–based, probabilistic) dependency parsing similar to (probabilistic) CYK parsing? In what ways is it different?
• The connections are a little tenuous, but let’s see what we can come up with:
– Both methods are attempting to determine the structure of a sentence; both methods are attempting to disambiguate amongst the (perhaps many) possible structures licensed by the grammar by using a probabilistic gram- mar to determine the most probable structure.
– Both methods process the tokens in the sentence one–by–one, left–to– right.
2

• There are numerous differences (probably too many to enumerate here), for example:
– Although POS tags are implicitly used in constructing the “oracle” (train- ing), the depedency parser doesn’t explicitly tag the sentence.
– The transition–based dependency parser can potentially take into account other (non–local) relations in the sentence, whereas CYK’s probabilities depend only on the (local) sub-tree.
– CYK adds numerous fragments to the chart, which don’t end up getting used in the final parse structure, whereas the transition–based depen- dency parser only adds edges that will be in the final structure.
3