3
$\begingroup$

Ridge regression's objective function: $$ L(w) = \underbrace{\|y - Xw\|^2}_\text{data term} + \underbrace{\lambda\|w\|^2}_\text{smoothness term} $$

I am trying to understand the regularization term, $\lambda\|w\|^2$. My questions are:

  1. What does smoothness mean here?

    I checked the definition of smooth in Wolfram, but it seems not right in here.

    A smooth function is a function that has continuous derivatives up to some desired order over some domain.

  2. I read a document explaining the smoothness term.

    page 12 in the pdf

    A very common assumption is that the underlying function is likely to be smooth, for example, having small derivatives. Smoothness distinguishes the examples in Figure 2. There is also a practical reason to prefer smoothness, in that assuming smoothness reduces model complexity:

    I have difficulty understanding above:

    • the underlying function is smooth will have small derivatives

    • smoothness reduces model complexity.

My counterexample is: $$ f(x) = w_0 + w_1x + w_2x^2 + w_3x^3 $$

with $w = [0.5, 0.7, 0.3, 0.4]$ , or $w = [5, 7, 3, 4],ドル they are both function of $C^\infty$

I know I must be making mistakes somewhere. Please help me to correctly understand it. Thank you.

Michael R. Chernick
43.8k28 gold badges87 silver badges160 bronze badges
asked May 24, 2017 at 2:24
$\endgroup$
3
  • 2
    $\begingroup$ I don't see why the authors say that smoothness requires derivatives to be small. It sort of depends on how you define small, where you require it to be small and what order of derivative you refer to. $\endgroup$ Commented May 24, 2017 at 3:17
  • $\begingroup$ In the context of polynomial fitting, I have this loose, artistic sense that if $\|\mathbf{w}\| < \|\mathbf{u}\|$ then $f(\mathbf{x}; \mathbf{w}) = \sum_j w_i x^j$ tends to be a less squiggly looking polynomial, function than $\sum_j u_i x^j$. I'd have to think about it if there's a way to put that in more rigorous terms. $\endgroup$ Commented May 24, 2017 at 5:28
  • $\begingroup$ Eg. checkout this polynomial curve fitting example. $\endgroup$ Commented May 24, 2017 at 5:42

1 Answer 1

5
$\begingroup$

As @Michael Chernick said, smoothness is a bad term. I can see it making sense if you are fitting a scatterplot smoother and want to limit the second derivatives, but here its really a shrinkage parameter ($\lambda,ドル that is).

It penalizes large coefficients. However, it does this smoothly in the sense that it does not "zero out" any of your variables. This is different than the "LASSO" regularizer, $\lambda \|w\|_1,ドル which can zero out variables.

Matthew Gunn
23.6k1 gold badge62 silver badges95 bronze badges
answered May 24, 2017 at 4:48
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.