Shrinkage methods

## Shrinkage methods

1. Why? Some variables might be redundant. Shrink the model.

### Lasso

1. Small constraint $t$ cause some of the coefficients reduce exactly to 0: this is variable selection, while producing sparse model.
2. Convex optimization.

Why would lasso leads to exact 0 coefficients?

Would spot the reason as long as you plot out the constraints and the RSS. Fig. 3.11.

### Compare Lasso and Ridge

For sparse models, lasso is better. Otherwise, lasso can make the fitting worse than ridge.

No rule of thumb.

### Generalization

Ridge and lasso can be generalized. Replace the distance calculation with other definitions, i.e., $\sum \lvert \beta_j \rvert^q$.

1. $q=0$: subset selection
2. $q=1$: lasso
3. $q=2$: ridge

Smaller $q$ leads to tighter selection.

{% highlight text %} Plot[Evaluate@Table[(1 - x^(q))^(1/q), {q, 0.5, 4, 0.5}], {x, 0, 1}, AspectRatio -> 1, Frame -> True, PlotLegends -> Placed[Table[“q=” <> ToString@q, {q, 0.5, 4, 0.5}], {Left, Bottom}], PlotLabel -> “Shrinkage as function of L-q norm disance”, FrameLabel -> {"!(*SubscriptBox[([Beta]), (i)])", “!(*SubscriptBox[([Beta]), (j)])"}] {% endhighlight %}

Planted: by ;