Gradient Descent Algorithm I

It is an optimization technique. Optimization refers to the task of either minimizing or maximizing some function f (x) by altering x. We usually phrase most optimization problems in terms of minimizing f (x). Maximization may be obtained via a minimization algorithm by minimizing −f (x). The function we want to minimize or maximize is called the objective function.

When we are minimizing it, we may also call it the cost function, loss function, or error function. For example, we might say x* = arg min f (x) i.e. find the value of x such that f(x) is minimum at that value. The derivative of a function y = f’ (x) from R to R, denoted as f ‘(x) or (dy/dx) , gives the slope of f (x) at the point x. It quantifies how a small change in x gets scaled in order to obtain a corresponding change in y. The derivative is therefore useful for minimizing a function because it tells us how to change x in order to make a small improvement in y. For example, we know that f (x − ϵ sign(f 0(x))) is less than f (x) for small enough ϵ. We can thus reduce f (x) by moving x in small steps with the opposite sign of the derivative. This technique is called gradient descent.

When f 0(x) = 0, the derivative provides no information about which direction to move. Points where f ‘(x) = 0 are known as critical points or stationary points. A local minimum is a point where f (x) is lower than at all neighboring points, so it is no longer possible to decrease f (x) by making infinitesimal steps. A local maximum is a point where f (x) is higher than at all neighboring points, so it is not possible to increase f (x) by making infinitesimal steps. Some critical points are neither maxima nor minima. These are known as saddle points.

A point that obtains the absolute lowest value of f (x) is a global minimum. It is possible for there to be only one global minimum or multiple global minima of the function. It is also possible for there to be local minima that are not globally optimal. In the context of deep learning, we optimize functions that may have many local minima that are not optimal, and many saddle points surrounded by very flat regions. All of this makes optimization very difficult, especially when the input to the function is multidimensional. We therefore usually settle for finding a value of that is very low, but not necessarily minimal in any formal sense.

My Revision Web Page

Pages

Gradient Descent Algorithm I

No comments:

Post a Comment

BLOGGER

Follow Me

Blog Archive

Popular

Tags

Report Abuse

About Me

Creating users (login and logout pages)

Subscribe

Followers

Blog Archive

Search This Blog

Cloud

Video Of Day

Ads

Popular Posts

Pages

Email Subscription

INSTAGRAM FEED