Pages

Optimization : Batch Optimization


Optimization:  It may happen that the loss function may not be convex. We want loss function as convex because then we can apply the gradient descent method and minimize the loss (error). There may be a number of global minima and maxima. So the target of any optimization technique is to reach the global minima. However, it is possible that we get trapped in local minima. Local minima are acceptable only when an error is not much. There are three different types of optimization techniques.

Batch Optimization: If the training sample size is very large then we use a batch optimization technique to find the loss function.  But for finding a loss function, we must have access to all training samples. Then find all training samples which are not correctly classified. Find the loss of individual training samples and find the sum of all loss functions to find the total loss function.
Upsides: Fewer updates to the model mean this variant of gradient descent is more computationally efficient than stochastic gradient descent.
The decreased update frequency results in a more stable error gradient and may result in a more stable convergence on some problems
The separation of the calculation of prediction errors and the model update lends the algorithm to parallel processing based implementations
Downs: The more stable error gradient may result in premature the convergence of the model to a less optimal set of parameters. The updates at the end of the training epoch require the additional complexity of accumulating prediction errors across all training examples. It requires the entire training dataset in memory and available to the algorithm. Model updates, and in turn training speed may become very slow for large datasets.

No comments:

Post a Comment

If you have any doubt, let me know

Email Subscription

Enter your email address:

Delivered by FeedBurner

INSTAGRAM FEED