PerceptualRobots Posted February 12, 2015 Posted February 12, 2015 Hello, Would someone be able to help explain the meaning of a term in a formula? The one in question is the deep learning weight adjustment formula from DeepNeuralNetworks. And here it is. So this shows the the iterative adjustment for the weights. Or is it the adjustment to the change applied to weights (as indicated by the delta)? But the main part I am unclear about is this. C is the cost function but what does this term mean?
studiot Posted February 12, 2015 Posted February 12, 2015 (edited) I know nothing about the underlying subject matter of neural networks, deep or otherwise. But the equation you refer to looks like a standard finite element iteration/discretisation from t to (t+1) where delta is the shift function and t is the iteration counter in some numerical approximation of the underlying mathematical equation. Since steepest descent methods are discussed, I would hazard a guess that this is due to linearization of an underlying nonlinear controlling mathematical equation. This is a common numerical approach in such cases. Edited February 12, 2015 by studiot
PerceptualRobots Posted February 13, 2015 Author Posted February 13, 2015 But the equation you refer to looks like a standard finite element iteration/discretisation from t to (t+1) where delta is the shift function and t is the iteration counter in some numerical approximation of the underlying mathematical equation. Since steepest descent methods are discussed, I would hazard a guess that this is due to linearization of an underlying nonlinear controlling mathematical equation. This is a common numerical approach in such cases. I'm afraid I understood very little of what you said, and am not sure if you were addressing my question. Are you able to explain what the term I referred to is in a simple way?
John Posted February 13, 2015 Posted February 13, 2015 (edited) If you're referring to the notation itself, it denotes a partial derivative, in this case the partial derivative of the cost function with respect to the variable wij. Roughly, conceptually, you can think of this as referring to the rate at which the value of the function changes with respect to the variable. That is to say (since I'm not sure what mathematical training you've had), holding any other variables constant, we change wij and see how the value of the cost function varies in response. Edited February 13, 2015 by John 1
PerceptualRobots Posted February 14, 2015 Author Posted February 14, 2015 If you're referring to the notation itself, it denotes a partial derivative, in this case the partial derivative of the cost function with respect to the variable wij. Roughly, conceptually, you can think of this as referring to the rate at which the value of the function changes with respect to the variable. That is to say (since I'm not sure what mathematical training you've had), holding any other variables constant, we change wij and see how the value of the cost function varies in response. Ok thanks. I now recall doing partial derivatives at school, but that was over thirty years ago so am trying to remember what they mean. So in this context it is the rate of change of the cost function. In practical terms is this change (of an unknown function) computed just by taking the change of the function from the previous iteration? Could it also be done by taking a moving average (exponential smoothing) of the change? With regards to the weight adjustment this would mean that if the cost function increases the change (the partial derivative) is positive and so the weight is increased. If the cost function decreases the change is negative and so the weight is decreased. In this way the weight should converge on a value that keeps the cost function at a maximum. If the weight value goes to high and results in a decrease of the cost function the adjustment will be in the opposite direction. (Change signs to minimize rather than maximize the function). Does that sound about right, or are there other things that should be taken into account?
John Posted February 14, 2015 Posted February 14, 2015 In practical terms is this change (of an unknown function) computed just by taking the change of the function from the previous iteration? Could it also be done by taking a moving average (exponential smoothing) of the change? For the first question, if I'm understanding you correctly, then yes. For the second, I don't know. With regards to the weight adjustment this would mean that if the cost function increases the change (the partial derivative) is positive and so the weight is increased. If the cost function decreases the change is negative and so the weight is decreased. In this way the weight should converge on a value that keeps the cost function at a maximum. If the weight value goes to high and results in a decrease of the cost function the adjustment will be in the opposite direction. (Change signs to minimize rather than maximize the function). Does that sound about right, or are there other things that should be taken into account? Well, with the caveat that I know very little about machine learning, I believe the idea is to iteratively minimize the cost function. I don't know why the equation on Wikipedia involves addition rather than subtraction. It may be a typo, or it may be that I'm misunderstanding how gradient descent is applied to training deep neural networks.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now