Pages

Linear Regression: Introduction


Suppose we are given data about oil prices in India with different variables like import expenses from other countries, transport within-country, etc of the last 10 years.
We are given this data and asked to find out the oil price in the next year. For this, we first try to find the relation between different parameters. Then we use the relation to find the oil price in the next year.
Linear Regression is used for continuous data distribution. Here we simply try to find out the relation between different (independent feature and dependent output) variables.
Let we assume that Input is taken from some p-dimensional space i.e  Ïµ Rp and output is in R.
Let us assume that we are given the data points x0 , x1,. …………., xn with their respective labels y1 , y2, ………., yn. We are asked to find the value at point X (let‘s say). So to find that value we first find the relation between xi and yi.
Let  \[\hat{y} = f(x) = \beta_0+\beta_1+....+\beta_p\]be the relation between independent and dependent parameters. Here terms on the right-hand side are (p+1) among which one is intercept term and others are the features (independent) variables. It is clear from the equation that y depends on p number of the features.
It is difficult to draw p-dimensions on paper, so let’s simplify the equation assuming that y depends on two parameters*. Now plot the points and the solution plane (as y depends on 2 parameters) which we get.
 Let us got plot like given as (it is just for example).
Here it is clearly observed that we are not a plane that fits the data points very well i.e. plane not passing from all data points. The next work is to minimize the distance of each data point from the plane so that the obtained plane can represent the data in the best way.

No comments:

Post a Comment

If you have any doubt, let me know

Email Subscription

Enter your email address:

Delivered by FeedBurner

INSTAGRAM FEED