●
●
●
●
●
●
h2
h1
+1
Goldberg, Chapter 4, adapted from Figure 4.2:
Feed-forward neural network with two hidden layers
●
●
●
h2
h1
+1
Goldberg, Chapter 4, adapted from Figure 4.2:
Feed-forward neural network with two hidden layers
●
●
●
Goldberg, Chapter 4, Equation 4.2
Goldberg, Chapter 4, Equations 4.3 and 4.4
Or with separate equations for each layer:
●
●
○
■
■
●
●
●
● dout = 1
○
○
● dout = k > 1 k
● k
●
●
●
Goldberg, Chapter 5, Figure 5.1 (c)
such as pick(x,5)
Goldberg, Chapter 5
●
○
●
●
●
○
○
●
●
●
●
Goldberg, Chapter 4, Figure 4.3
●
●
●
○
○
●
●
●
●
●
●
●
●
○
●
○
○
●
●
○
○
●
○
●
●
●
●
●
○
○
●
●
●
●
●
●
●
●
●
●
●
●
●
●
○
○
J & M Fig 7.12
● Loss takes into account only the hidden layer matrix W and the output layer matrix U
,wt-1)
J & M Fig 7.13
● Input: one-hot word vectors 1 x |V|
● Each input one-hot vector is fully
connected to an embedding layer
(word projection) by an embedding
matrix E for word embeddings of
dimension d, so E is d x |V| (one
embedding row per word)
● When conditioning on 3 past words,
the projection layer will be 1 x 3d
● Note that back propagation of Loss
will update E, which is shared
across words
,wt-1)
●
●
○
○
●
●
○
○
●
●
●
●
○
○
○
●