Jump to content

Recommended Posts

Posted

Hello,

 

Would someone be able to help explain the meaning of a term in a formula? The one in question is the deep learning weight adjustment formula from DeepNeuralNetworks.

 

And here it is.

 

post-110712-0-88479600-1423729238.png
So this shows the the iterative adjustment for the weights. Or is it the adjustment to the change applied to weights (as indicated by the delta)?
But the main part I am unclear about is this.
post-110712-0-30310600-1423729239.png

 

C is the cost function but what does this term mean?

 

post-110712-0-88479600-1423729238.png

post-110712-0-30310600-1423729239.png

Posted (edited)

I know nothing about the underlying subject matter of neural networks, deep or otherwise.

 

But the equation you refer to looks like a standard finite element iteration/discretisation from t to (t+1) where delta is the shift function and t is the iteration counter in some numerical approximation of the underlying mathematical equation.

 

Since steepest descent methods are discussed, I would hazard a guess that this is due to linearization of an underlying nonlinear controlling mathematical equation. This is a common numerical approach in such cases.

Edited by studiot
Posted

 

But the equation you refer to looks like a standard finite element iteration/discretisation from t to (t+1) where delta is the shift function and t is the iteration counter in some numerical approximation of the underlying mathematical equation.

 

Since steepest descent methods are discussed, I would hazard a guess that this is due to linearization of an underlying nonlinear controlling mathematical equation. This is a common numerical approach in such cases.

 

I'm afraid I understood very little of what you said, and am not sure if you were addressing my question. Are you able to explain what the term I referred to is in a simple way?

Posted (edited)

If you're referring to the notation itself, it denotes a partial derivative, in this case the partial derivative of the cost function with respect to the variable wij.

 

Roughly, conceptually, you can think of this as referring to the rate at which the value of the function changes with respect to the variable. That is to say (since I'm not sure what mathematical training you've had), holding any other variables constant, we change wij and see how the value of the cost function varies in response.

Edited by John
Posted

If you're referring to the notation itself, it denotes a partial derivative, in this case the partial derivative of the cost function with respect to the variable wij.

 

Roughly, conceptually, you can think of this as referring to the rate at which the value of the function changes with respect to the variable. That is to say (since I'm not sure what mathematical training you've had), holding any other variables constant, we change wij and see how the value of the cost function varies in response.

 

Ok thanks. I now recall doing partial derivatives at school, but that was over thirty years ago so am trying to remember what they mean. So in this context it is the rate of change of the cost function.

 

In practical terms is this change (of an unknown function) computed just by taking the change of the function from the previous iteration? Could it also be done by taking a moving average (exponential smoothing) of the change?

 

With regards to the weight adjustment this would mean that if the cost function increases the change (the partial derivative) is positive and so the weight is increased. If the cost function decreases the change is negative and so the weight is decreased. In this way the weight should converge on a value that keeps the cost function at a maximum. If the weight value goes to high and results in a decrease of the cost function the adjustment will be in the opposite direction. (Change signs to minimize rather than maximize the function).

 

Does that sound about right, or are there other things that should be taken into account?

Posted

In practical terms is this change (of an unknown function) computed just by taking the change of the function from the previous iteration? Could it also be done by taking a moving average (exponential smoothing) of the change?

For the first question, if I'm understanding you correctly, then yes. For the second, I don't know.

 

With regards to the weight adjustment this would mean that if the cost function increases the change (the partial derivative) is positive and so the weight is increased. If the cost function decreases the change is negative and so the weight is decreased. In this way the weight should converge on a value that keeps the cost function at a maximum. If the weight value goes to high and results in a decrease of the cost function the adjustment will be in the opposite direction. (Change signs to minimize rather than maximize the function).

 

Does that sound about right, or are there other things that should be taken into account?

Well, with the caveat that I know very little about machine learning, I believe the idea is to iteratively minimize the cost function. I don't know why the equation on Wikipedia involves addition rather than subtraction. It may be a typo, or it may be that I'm misunderstanding how gradient descent is applied to training deep neural networks.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.