For the functions discussed in part
I, we must use a concept of partial derivatives. The partial derivate measures
how the function changes as only one variable increases. The gradient of the function contains all partial derivatives. In the case of multi-dimensions,
critical points are points where every element of the gradient is equal to zero.
The directional derivative in the direction u (a unit vector) is the slope of the function f in direction u. In other
words, the derivative of the function f (x + αu) with respect to α, evaluated at α = 0.
Using the chain rule, we can see that this uT∇x f (x). To minimize f, we would
like to find the direction in which f decreases the fastest.
This function is minimized when u
points in the opposite direction of the gradient. In other words, the
gradient points directly uphill, and the negative gradient points directly
downhill. We can decrease f by
moving in the direction of the negative gradient. This is known as the method
of steepest descent or gradient descent.
Steepest descent proposes a new
point
x’
= x − ϵ ∇ xf (x)
where ϵ is the size of the step. We can choose ϵ in several different ways. A popular approach is to
set ϵ to a small
constant. Sometimes, we can solve for the step size that makes the directional
derivative vanish. Another approach is to evaluate f (x − ϵ ∇xf (x)) for several
values of ϵ and choose the
one that results in the smallest objective function value. This last strategy
is called a line search.
Steepest descent converges when
every element of the gradient is zero (or, in practice, very close to zero). In
some cases, we may be able to avoid running this iterative algorithm, and just
jump directly to the critical point by solving the equation ∇xf (x)
= 0 for x.
We are also sometimes interested
in a derivative of a derivative. This is known as a second derivative. The
second derivative tells us how the first derivative will change as we vary the input.
This means it can be useful for determining whether a critical point is a local
maximum, a local minimum, or a saddle point. When f’’(x) > 0,
this means that f’(x)
increases as we move to the right, and f’(x) decreases
as we move to the left.
No comments:
Post a Comment
If you have any doubt, let me know