Suppose we are given data
about oil prices in India with different variables like import expenses from other countries, transport within-country, etc of the last 10 years.
We
are given this data and asked to find out the oil price in the next year. For
this, we first try to find the relation between different parameters. Then we
use the relation to find the oil price in the next year.
Linear
Regression is used for continuous data distribution. Here we simply try to find
out the relation between different (independent feature and dependent output) variables.
Let
we assume that Input is taken from some p-dimensional space i.e X ϵ Rp and output
is in R.
Let us assume that we are given the data points x0
, x1,. …………., xn
with their respective labels y1
, y2, ………., yn. We are asked to find the value at
point X (let‘s say). So to find that value we first find the relation between xi
and yi.
Let \[\hat{y} = f(x) = \beta_0+\beta_1+....+\beta_p\]be the relation between independent and dependent parameters.
Here terms on the right-hand side are (p+1) among which one is intercept term and
others are the features (independent) variables. It is clear from the equation
that y depends on p number of the features.
It is difficult to draw p-dimensions on paper, so let’s simplify the equation assuming that y depends on two parameters*. Now plot the points and
the solution plane (as y depends on 2 parameters) which we get.
Let us got plot like
given as (it is just for example).
Here it is clearly observed that we are not a plane that fits
the data points very well i.e. plane not passing from all data points. The next work is to minimize the distance of each data point from the plane so that the obtained plane can represent the data in the best way.
No comments:
Post a Comment
If you have any doubt, let me know