Pages

Optimization : Stochastic Gradient Descent


Stochastic Gradient Descent: Instead of considering all the samples together, we take samples one by one. Find whether the sample is correctly classified or not. If the sample is not classified correctly then compute the loss function. Use this loss function to update the weight matrix. If an error is not acceptable then repeat the steps until error is acceptable.
Upsides: The Frequent updates immediately give an insight into the performance of the model and the rate of improvement. This variant of gradient descent may be the simplest to understand and implement. The increased model update frequency can result in faster learning on some problems. The noisy update process can allow the model to avoid local minima (e.g. premature convergence).
Downs: Updating the model so frequently is more computationally expensive than other configurations of gradient descent, taking significantly longer to train models on large datasets. The frequent updates can result in a noisy gradient signal, which may cause the model parameters and in turn the model error to jump around (have a higher variance over training epochs). The noisy learning process down the error gradient can also make it hard for the algorithm to settle on an error minimum for the model.

No comments:

Post a Comment

If you have any doubt, let me know

Email Subscription

Enter your email address:

Delivered by FeedBurner

INSTAGRAM FEED