Ok thanks. I now recall doing partial derivatives at school, but that was over thirty years ago so am trying to remember what they mean. So in this context it is the rate of change of the cost function.
In practical terms is this change (of an unknown function) computed just by taking the change of the function from the previous iteration? Could it also be done by taking a moving average (exponential smoothing) of the change?
With regards to the weight adjustment this would mean that if the cost function increases the change (the partial derivative) is positive and so the weight is increased. If the cost function decreases the change is negative and so the weight is decreased. In this way the weight should converge on a value that keeps the cost function at a maximum. If the weight value goes to high and results in a decrease of the cost function the adjustment will be in the opposite direction. (Change signs to minimize rather than maximize the function).
Does that sound about right, or are there other things that should be taken into account?