r,I,O2N di 2N i=1,…,r f 2 Nr(I,d1,…,dr 1,O;Id,…,Id)
A 2 Rm⇥n B 2 Rm Wi 2 Rdi⇥di 1
n, m 2 N f : Rn ! Rm f(x) = Ax + B,
Nr(I,d1,…,dr 1,O;Id,…,Id)
bi 2 Rdi f
i = 1, . . . , r i
f x 2 RI f(x)= r Lr ··· 1 L1(x),
i Li(x) = W ix + bi
i=1,…,r Li :Rdi 1 !Rdi x 2 Rdi 1 .
i = 1,…,r f(x)=Lr Lr 1 ··· L1(x).
i
f
f f(x)=Wr(Wr 1…W1x)+Wr(Wr 1…W2b1)+···+Wrbr 1 +br.
x 2 RI
A=WrWr 1…W1, B=Wr(Wr 1…W2b1)+···+Wrbr 1 +br,
⇤
f✓ 2N2(1,2,1;ReLU,Id)
✓ = 4 ,⇥ 2 3⇤;(0,2),2 .
4
l ( yˆ , y ) = | yˆ y | yˆ , y 2 R f ✓ ( 1 )
!
r✓ l(f✓ (1), 0)
ReLU0(x):=1 |x|0 :=0 x=0
a0 := 1
r✓ l(f✓ (1), 0)
r yˆ l ( y , yˆ ) 8
><>: 1 , y ˆ > y , r yˆ l ( y , yˆ ) = 1 , yˆ < y ,
0
yˆ = f ✓ ( 1 ) = 6 < 0 = y r yˆ l ( y , yˆ ) = 1
z1 =W1 ·1+0 = 4 . 2 2
a 1 = 40 .
z2 =⇥ 2 3⇤40 +2= 6.
a2 = 6. f✓(1) = 6.
2 =Id0( 6) ryˆl(a2,0)=1· 1= 1. !
1
1 =ReLU0 4 2 2 =1 2 =2 . 2 3 0 3 0
r✓ l(f✓ (1), 0) @l = 2 = 1,
@b2
@l =⇥2 0⇤,
@b1
@l =⇥ 2a1 2a12⇤=⇥ 4 0⇤,
@W2 10 @l = 1a = 2
@ W 1 21 a 0 0
r✓l(f✓(1), 0) = 20 , ⇥ 4 0⇤ ; (2, 0); 1!.
✓new =✓old ⌘r✓l(f✓(1),0),
⌘>0 ⇤
P = (P(j))dj=1 Q = (Q(j))dj=1 {1,…,d}
d 2
Xd
Q !P P ( j )
DKL(P ||Q) := P
j=1
P (j) log Q(j) . 1
yi =(y1i,…,ydi) i=1,…,N
ki 2{1,…,d}
Pˆ = (Pˆ(j))dj=1
Pˆ ( j ) = # { i : y ji = 1 } . N
Q
d 2
yji =1 j=ki
yji = 0
D K L ( Pˆ | | Q ) Q 1 Xn
L(Q) = N l(Q,yi), i=1
i Xd
l(Q, y ) = {yji =1} log(Q(j)).
j=1
Q⇤
Q⇤ = arg minQL(Q)
1 XN
ˆ
Q0 =argminQDKL(P||Q) !
= arg minQ N
= arg minQ N
l(Q, yi)
i=1
1 XN Xd
{yji =1} log(Q(j)). Q Q0
i=1 j=1
d
d
= arg minQ Pˆ(j) log(Pˆ(j)) Pˆ(j) log(Q(j)). j=1
= arg minQ X Pˆ(j) log Pˆ(j) j=1 Q(j)
Q
Q⇤ = Q0
Xd j=1
Pˆ(j) log(Q(j)).
{ y ji = 1 } .
0 1 Xd XN
Q = arg minQ N
Q0 = arg minQ
ˆ 1XN
P ( j ) = N
Pˆ i=1
j=1 i=1
{yji =1} log(Q(j)).
⇤