Gradient Descent Method II

For the functions discussed in part I, we must use a concept of partial derivatives. The partial derivate measures how the function changes as only one variable increases. The gradient of the function contains all partial derivatives. In the case of multi-dimensions, critical points are points where every element of the gradient is equal to zero.

The directional derivative in the direction u (a unit vector) is the slope of the function f in direction u. In other words, the derivative of the function f (x + αu) with respect to α, evaluated at α = 0.

Using the chain rule, we can see that this u^T∇x f (x). To minimize f, we would like to find the direction in which f decreases the fastest.

This function is minimized when u points in the opposite direction of the gradient. In other words, the gradient points directly uphill, and the negative gradient points directly downhill. We can decrease f by moving in the direction of the negative gradient. This is known as the method of steepest descent or gradient descent.

Steepest descent proposes a new point

x’ = x − ϵ ∇ _xf (x)

where ϵ is the size of the step. We can choose ϵ in several different ways. A popular approach is to set ϵ to a small constant. Sometimes, we can solve for the step size that makes the directional derivative vanish. Another approach is to evaluate f (x − ϵ ∇_xf (x)) for several values of ϵ and choose the one that results in the smallest objective function value. This last strategy is called a line search.

Steepest descent converges when every element of the gradient is zero (or, in practice, very close to zero). In some cases, we may be able to avoid running this iterative algorithm, and just jump directly to the critical point by solving the equation ∇_xf (x) = 0 for x.

We are also sometimes interested in a derivative of a derivative. This is known as a second derivative. The second derivative tells us how the first derivative will change as we vary the input. This means it can be useful for determining whether a critical point is a local maximum, a local minimum, or a saddle point. When f’’(x) > 0, this means that f’(x) increases as we move to the right, and f’(x) decreases as we move to the left.

My Revision Web Page

Pages

Gradient Descent Method II

No comments:

Post a Comment

BLOGGER

Follow Me

Blog Archive

Popular

Tags

Report Abuse

About Me

Creating users (login and logout pages)

Subscribe

Followers

Blog Archive

Search This Blog

Cloud

Video Of Day

Ads

Popular Posts

Pages

Email Subscription

INSTAGRAM FEED