Before we talk
about subset selection, let’s try to find why we are not satisfied by least
square estimates.
Prediction
Accuracy: Least square often has low bias and high variance. It can be
improved by shrinking or setting some coefficients to zero. We will lose a
little bit of bias to reduce variance thus improve prediction accuracy.
Interpretation:
With a large number of predictors, we often would like to determine a smaller
subset that exhibits the strongest effects.
Here, We are
some of the approaches to variable subset selection with linear regression.
1. Forward Stepwise
Selection: Forward stepwise selection starts with intercept,
then sequentially adds into the model the predictor that improves the fit. It is a
greedy algorithm. It produces the nested sequence of models. It is used because
of two reasons: Computational and Statistical. Suppose we are given a large
value of p (features) than the best sequence can’t be computed. In that case, we
use Forward Stepwise sequence ( p >> N). Since forward stepwise is a more
constrained search so it will have lower variance but perhaps more bias.
2. Backward Stepwise
Selection: It starts with the full model and sequentially deletes a predictor that has the least impact on the fit. The candidate for dropping is the
variable with the smallest Z-score. It is used when N > p, while forward
selection can always be used.
3. Forward Stagewise Regression: It is more constrained more than forward stepwise regression. It starts
like forward stepwise regression, with an intercept, and centered
predictors with coefficients initially all 0. At each step, the algorithm
identifies the variable most correlated with the current residual. It then computes the simple linear regression coefficient of the residual
on this chosen variable and then adds it to the current coefficient for that
variable. This is continued till none of the variables have a correlation with
the residuals—i.e. the least-squares fit when N > p.
No comments:
Post a Comment
If you have any doubt, let me know