Pages

Gradient Descent Method II


For the functions discussed in part I, we must use a concept of partial derivatives. The partial derivate measures how the function changes as only one variable increases. The gradient of the function contains all partial derivatives. In the case of multi-dimensions, critical points are points where every element of the gradient is equal to zero.

The directional derivative in the direction u (a unit vector) is the slope of the function f in direction u. In other words, the derivative of the function f (x + αu) with respect to α, evaluated at α = 0.

 Using the chain rule, we can see that this uT∇x f (x). To minimize f, we would like to find the direction in which f decreases the fastest.

This function is minimized when u points in the opposite direction of the gradient. In other words, the gradient points directly uphill, and the negative gradient points directly downhill. We can decrease f by moving in the direction of the negative gradient. This is known as the method of steepest descent or gradient descent. 
Steepest descent proposes a new point
x’ = x ϵxf (x)
where ϵ  is the size of the step. We can choose ϵ  in several different ways. A popular approach is to set ϵ to a small constant. Sometimes, we can solve for the step size that makes the directional derivative vanish. Another approach is to evaluate f (x ϵxf (x)) for several values of ϵ and choose the one that results in the smallest objective function value. This last strategy is called a line search.
Steepest descent converges when every element of the gradient is zero (or, in practice, very close to zero). In some cases, we may be able to avoid running this iterative algorithm, and just jump directly to the critical point by solving the equation ∇xf (x) = 0 for x.
We are also sometimes interested in a derivative of a derivative. This is known as a second derivative. The second derivative tells us how the first derivative will change as we vary the input. This means it can be useful for determining whether a critical point is a local maximum, a local minimum, or a saddle point. When f’’(x) > 0, this means that f(x) increases as we move to the right, and f(x) decreases as we move to the left.

No comments:

Post a Comment

If you have any doubt, let me know

Email Subscription

Enter your email address:

Delivered by FeedBurner

INSTAGRAM FEED