Is backpropagation same as gradient descent?
Back-propagation is the process of calculating the derivatives and gradient descent is the process of descending through the gradient. I.e. adjusting the parameters of the model to go down through the loss function.
When computing the cost of a single example, it can initially be very high if your weights and biases are initially way off the mark. Do you go through iterations on that single example to reduce the cost for that example before going on to the next example? Do you then compute the total cost of all examples before computing the gradient of the total cost function?
Is the gradient of the total cost function. The sum of the gradients of the cost function for each example?
The whole idea of gradient descent is minimizing the cost therefore we update the weight and biases accordingly.
Initially weight and biases taken with the same distribution.
And updating can occur after calculating the cost of a particular example. Or in batch (i.e after a certain example).
Moreover, in general, we can update weights after:
1. each sample
2. All samples
3. Small subset of samples
Firstly, in case 1 it is called stochastic gradient descent.
Secondly, in case 2, it is called Batch gradient descent.
Thirdly, in case 3, generally referred to as mini batch gradient descent.
Lastly, in conclusion, for the last one. It is the gradient of the total cost function the sum of the gradients. For samples more than one we sum up the loss contributed by each sample. Lastly, and then we calculate the derivatives (or gradients) with respect to the summed-up loss.
What is the difference between gradient and gradient descent?
Gradient ascent is maximizing of the function so as to achieve better optimization used in reinforcement learning it gives upward slope or increasing graph. Gradient descent is minimizing the cost function used in linear regression it provides a downward or decreasing slope of cost function.