CS计算机代考程序代写 Gradient Descent

Gradient Descent
89

Gradient descent in one dimension
gradient_descent_1d(𝜃𝑖𝑛𝑖𝑡, 𝑓, 𝑓′, 𝜂, 𝜖): 𝜃(0) = 𝜃𝑖𝑛𝑖𝑡
t=0 loop
t = t+1
step size eta
𝜃(𝑡) = 𝜃𝑡−1 − 𝜂𝑓′(𝜃 𝑡−1 ) until 𝑓 𝜃(𝑡) − 𝑓(𝜃(𝑡−1)) < 𝜖 return(𝜃𝑡) convergence criterion 91 Convergence local minimum global minimum • Theorem: If 𝑓 is convex, then for any desired accuracy 𝜖, there is some step size 𝜂 such that gradient descent will converge to a theta within epsilon of the optimum. 93 Gradient descent in multiple dimensions gradient_descent(𝜃𝑖𝑛𝑖𝑡 , 𝑓, ∇𝜃 𝑓, 𝜂, 𝜖): ... 𝜃(𝑡) = 𝜃𝑡−1 − 𝜂∇𝜃𝑓(𝜃 𝑡−1 ) ... 𝑓(𝜃,...,𝜃 ) 1𝑀 ∇𝜃𝑓: 𝑚 × 1 𝜃:𝑚×1 𝜃𝑖𝑛𝑖𝑡: 𝑚 × 1 94 ∇𝜃𝑓 = 𝜕𝑓/𝜕𝜃 1 𝜕𝑓/𝜕𝜃 2 ⋮ 𝜕𝑓/𝜕𝜃 𝑚