Jump to content

Recommended Posts

Posted (edited)

I have received the following problem:

Concider the following simple model of a neuron

z = wx + b logits,

yˆ = g(z) activation,

L2 (w, b) = 12 (y − ŷ)^2 quadratic loss (Mean Squared Error (MSE), L2 loss, l2-norm),

L1 (w, b) = |y − yˆ| absolut value loss (Mean Absolut Error (MAE), L1 loss, l1-norm),

with x,w,b ∈ R. Calculate the derivatives ∂L/∂w and ∂L/∂b for updating the weight w and bias b. Determine the results for both loss functions (L1, L2) and assume a sigmoid and a tanh activation function g(z). Write down all steps of your derivation. Hint: You have to use the chain rule.

I have considered the following approach for example L2 derived after w:

 

\begin{equation}
L_2 = \frac{1}{2} (y - \hat{y})^2 = \frac{1}{2} \left( y - g(z) \right)^2 = \frac{1}{2} \left( y - g(wx + b) \right)^2 = \frac{1}{2} \left( y - \frac{1}{1 + e^{-wx + b}} \right)^2
\end{equation}

This expression could then be easily derived using the chain rule. 
However, in the literature I find the following approach:

 

\begin{equation}
\frac{\partial L_2}{\partial w} = \frac{\partial L_2}{\partial \hat{y}} \times \frac{\partial \hat{y}}{\partial z} \times \frac{\partial z}{\partial w} = -(y - \hat{y}) \times g'(z) \times x
\end{equation}


So which of the two approaches is the right one in this context? Or is there even an adner connection?

 

Edited by kaan97
delete uploaded image

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.