04.Shrinkage Methods
Shrinkage methods
Shrinkage methods
- Why? Some variables might be redundant. Shrink the model.
Ridge Regression
Lasso
- Small constraint $t$ cause some of the coefficients reduce exactly to 0: this is variable selection, while producing sparse model.
- Convex optimization.
Why would lasso leads to exact 0 coefficients?
Would spot the reason as long as you plot out the constraints and the RSS. Fig. 3.11.
Compare Lasso and Ridge
For sparse models, lasso is better. Otherwise, lasso can make the fitting worse than ridge.
No rule of thumb.
Generalization
Ridge and lasso can be generalized. Replace the distance calculation with other definitions, i.e., $\sum \lvert \beta_j \rvert^q$.
- $q=0$: subset selection
- $q=1$: lasso
- $q=2$: ridge
Smaller $q$ leads to tighter selection.
{% highlight text %} Plot[Evaluate@Table[(1 - x^(q))^(1/q), {q, 0.5, 4, 0.5}], {x, 0, 1}, AspectRatio -> 1, Frame -> True, PlotLegends -> Placed[Table[“q=” <> ToString@q, {q, 0.5, 4, 0.5}], {Left, Bottom}], PlotLabel -> “Shrinkage as function of L-q norm disance”, FrameLabel -> {"!(*SubscriptBox[([Beta]), (i)])", “!(*SubscriptBox[([Beta]), (j)])"}] {% endhighlight %}
Table of Contents
Current Ref:
- esl/04.shrinkage-methods.md