It
is an optimization technique. Optimization refers to the task of either minimizing
or maximizing some function f (x) by
altering x. We usually phrase
most optimization problems in terms of minimizing f (x). Maximization may be obtained via
a minimization algorithm by minimizing −f (x). The function we want
to minimize or maximize is called the objective function.
When we are
minimizing it, we may also call it the cost
function, loss function,
or error function. For example,
we might say x* = arg min f (x)
i.e. find the value of x such that f(x) is minimum at that value. The derivative of a function y = f’ (x)
from R to R, denoted as f ‘(x)
or (dy/dx) ,
gives the slope of f (x) at the point x. It quantifies how
a small change in x gets scaled in order to
obtain a corresponding change in y. The derivative is therefore useful for minimizing a
function because it tells us how to change x in order to make a small improvement in y.
For example, we know that f (x − ϵ sign(f 0(x)))
is less than f (x)
for small enough ϵ. We can thus reduce f (x) by moving x in small steps with the opposite sign of the derivative.
This technique is called gradient descent.

When f 0(x) = 0, the
derivative provides no information about which direction to move. Points where f ‘(x) = 0 are
known as critical points or stationary points. A local minimum is a point where f (x) is lower
than at all neighboring points, so it is no longer possible to decrease f (x) by making
infinitesimal steps. A local maximum is
a point where f (x) is higher
than at all neighboring points, so it is not possible to increase f (x) by making
infinitesimal steps. Some critical points are neither maxima nor minima. These
are known as saddle points.

A point that obtains the absolute
lowest value of f (x) is a global minimum. It is possible for
there to be only one global minimum or multiple global minima of the function.
It is also possible for there to be local minima that are not globally optimal.
In the context of deep learning, we optimize functions that may have many local
minima that are not optimal, and many saddle points surrounded by very flat
regions. All of this makes optimization very difficult, especially when the
input to the function is multidimensional. We therefore usually settle for finding
a value of that is very
low, but not necessarily minimal in any formal sense.
No comments:
Post a Comment
If you have any doubt, let me know