Error Analysis: Let us consider that we
are given two classes and our aim is to find the decision boundary that
separates these classes. Let equation of
hyperplane be \[ax_1+bx_2=c\]
that
separates these two classes where c represents the position of the line. For n-dimensional data, we can write it as \[w^Tw + w_0 = 0\]
where \[w^T = (w_1 \; w_2 \; w_3 \;........\; w_n )\]
and \[x^T = (x_1 \; x_2 \; x_3 \;........\; x_n )\]
Since
we have intercept term so let’s change above equation (2) to more general form,
i.e. \[a^Ty=0\]
where \[w^T = (w_1 \; w_2 \; w_3 \;........\; w_n \; w_0)\]
and \[y^T = (x_1 \; x_2 \; x_3 \;........\; x_n \; x_0)\]
If
aTy > 0 is true for
one class then if aTy <0 is true for some data points of the same
class the point is said to be
misclassified. In simple words, the data point belongs to class 1 but showing the opposite result is misclassified. Then there is an error in classification.
To
minimize this error we use Gradient Descent
Algorithm (GDA). In GDA, we find the gradient of the error then updates the
weights to minimize error.
Let error is \[\Sigma(-a^Ty), \forall y\]
To minimize the
error, take gradient (derivative ) with respect to “a” as we need to find “a”
that minimize the error. We will obtain \[J(a) = \Sigma(-y), \forall y\]
First, we choose
a(0) randomly then we try to update it in such a way it reduces error.
Let after kth
iteration, the error is reduced then we can write the weight updation rule as \[a(k+1) = a(k) -\eta \Sigma(-y)\]
where
η is the learning rate or rate of convergence.
** It should be
noted here that we are assuming that the data
point belongs to the class for which $a^Ty > 0 $ is true. We can
also do the same for the other class.
Above figure is
just to show the classification of the data points by a line (or hyperplane).
No comments:
Post a Comment
If you have any doubt, let me know