Stochastic Gradient Descent: Instead of considering all the samples together, we take samples one by
one. Find whether the sample is correctly classified or not. If the sample is not
classified correctly then compute the loss function. Use this loss function to
update the weight matrix. If an error is not acceptable then repeat the steps until
error is acceptable.
Upsides: The Frequent updates
immediately give an insight into the performance of the model and the rate of
improvement. This variant of gradient descent may be the simplest to understand
and implement. The increased model update frequency can result in faster
learning on some problems. The noisy update process can allow the model to
avoid local minima (e.g. premature convergence).
Downs: Updating the model so
frequently is more computationally expensive than other configurations of
gradient descent, taking significantly longer to train models on large
datasets. The frequent updates can result in a noisy gradient signal, which may
cause the model parameters and in turn the model error to jump around (have a
higher variance over training epochs). The noisy learning process down the
error gradient can also make it hard for the algorithm to settle on an error
minimum for the model.
No comments:
Post a Comment
If you have any doubt, let me know