Pages

Classification : Gradient Descent Algorithm


Error Analysis: Let us consider that we are given two classes and our aim is to find the decision boundary that separates these classes.  Let equation of hyperplane be\[ax_1+bx_2=c\]
that separates these two classes where c represents the position of the line.  For n-dimensional data, we can write it as \[w^Tw + w_0 = 0\]
where \[w^T = (w_1 \; w_2 \; w_3 \;........\; w_n )\]
and \[x^T = (x_1 \; x_2 \; x_3 \;........\; x_n )\]
Since we have intercept term so let’s change above equation (2)  to more general form,
i.e. \[a^Ty=0\]
where \[w^T = (w_1 \; w_2 \; w_3 \;........\; w_n \; w_0)\]
and \[y^T = (x_1 \; x_2 \; x_3 \;........\; x_n \; x_0)\]       
           
If aTy > 0 is true for one class then if aTy <0 is true for some data points of the same class the point is said to be misclassified. In simple words, the data point belongs to class 1 but showing the opposite result is misclassified. Then there is an error in classification.
To minimize this error we use Gradient Descent Algorithm (GDA). In GDA, we find the gradient of the error then updates the weights to minimize error.
Let  error is \[\Sigma(-a^Ty), \forall y\]
To minimize the error, take gradient (derivative ) with respect to “a” as we need to find “a” that minimize the error.  We will obtain  \[J(a) = \Sigma(-y), \forall y\]
First, we choose a(0) randomly then we try to update it in such a way it reduces error.
Let after kth iteration, the error is reduced then we can write the weight updation rule as\[a(k+1) = a(k) -\eta \Sigma(-y)\]
where η is the learning rate or rate of convergence.
** It should be noted here that we are assuming that the data point belongs to the class for which $a^Ty > 0 $ is true. We can also do the same for the other class.


Above figure is just to show the classification of the data points by a line (or hyperplane).

No comments:

Post a Comment

If you have any doubt, let me know

Email Subscription

Enter your email address:

Delivered by FeedBurner

INSTAGRAM FEED