Pages

Gradient Descent Algorithm I


It is an optimization technique. Optimization refers to the task of either minimizing or maximizing some function f (x) by altering x. We usually phrase most optimization problems in terms of minimizing f (x). Maximization may be obtained via a minimization algorithm by minimizing f (x). The function we want to minimize or maximize is called the objective function
When we are minimizing it, we may also call it the cost function, loss function, or error function. For example, we might say x* =  arg min f (x) i.e. find the value of x such that f(x) is minimum at that value. The derivative of a function y = f’ (x) from R to R, denoted as f (x) or (dy/dx) , gives the slope of f (x) at the point x. It quantifies how a small change in x gets scaled in order to obtain a corresponding change in y. The derivative is therefore useful for minimizing a function because it tells us how to change x in order to make a small improvement in y. For example, we know that f (x − ϵ sign(f 0(x))) is less than f (x) for small enough ϵ. We can thus reduce f (x) by moving x in small steps with the opposite sign of the derivative. This technique is called gradient descent.
                             

When f 0(x) = 0, the derivative provides no information about which direction to move. Points where f ‘(x) = 0 are known as critical points or stationary points. A local minimum is a point where f (x) is lower than at all neighboring points, so it is no longer possible to decrease f (x) by making infinitesimal steps. A local maximum is a point where f (x) is higher than at all neighboring points, so it is not possible to increase f (x) by making infinitesimal steps. Some critical points are neither maxima nor minima. These are known as saddle points.

                                   

A point that obtains the absolute lowest value of f (x) is a global minimum. It is possible for there to be only one global minimum or multiple global minima of the function. It is also possible for there to be local minima that are not globally optimal. In the context of deep learning, we optimize functions that may have many local minima that are not optimal, and many saddle points surrounded by very flat regions. All of this makes optimization very difficult, especially when the input to the function is multidimensional. We therefore usually settle for finding a value of that is very low, but not necessarily minimal in any formal sense.





No comments:

Post a Comment

If you have any doubt, let me know

Email Subscription

Enter your email address:

Delivered by FeedBurner

INSTAGRAM FEED