Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

uttgeorge/SVM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

7 Commits

Repository files navigation

SVM

Please check the .ipynb files instead of readme files

SVM code from scratch Classification

Prerequisite:Linearly separable

Basic idea is Max Margin Classifier, we have to find the widest road between class1 and class2.

\begin{align} max:margin(w,b) \st. \left{\begin{matrix} w^Tx_i +b>0,y_i=+1\ w^Tx_i +b<0,y_i=-1 \end{matrix}\right. \ \Rightarrow y_i(w^{T}x_{i}+b) > 0\ \forall i=1,2,...,N \end{align}

  1. For the hard margin:

    • We set margin equals 1, and $y_i(w^Tx_i+b)\geqslant1$;
  2. For the soft margin:

    • We allow some noise, so that we can increase the robustness of our system.

    • We introduced slckness variable $\xi$, which represent loss.

    • And now the margin function changes to $y_i(w^Tx_i+b)\geqslant1-\xi_i,s.t.xi\geqslant0$

Suppose we have two classes X and O, for class X:

  1. When X is correctly classified, means X locates outside the margin, then $\xi=0$
  2. WHen X is incorrecly classified and locates on the right side of $y_i(w^Tx_i+b)\geqslant0$,then 0ドル&lt;\xi\leqslant0$
  3. When X is on the other side of margin, which means $y_i(w^Tx_i+b)\leqslant0$, then $\xi&gt;1$

Based on these 3 conditions, we get the new target funtion:

\begin{align} min_{w,b,\xi }:\frac{1}{2}\left | w \right |^{2}+C\sum_{i=1}^{N}\xi_i\ s.t. \left{\begin{matrix}y_i(w^Tx_i + b)\geqslant 1-\xi_i \ \xi_i\geqslant0\ \end{matrix}\right. \ \end{align}

The loss function here called Hinge Loss, basically it uses distance to measure loss: $\xi$ represents the distance from a point to its corresponding margin $w^Tx+b=1$ when it is miss-classified.

  1. If $w^Tx+b\geqslant1$, $\xi_i=0$, No loss, correct.
  2. If $w^Tx+b&lt;1$, $\xi_i=1-y_i(w^Tx+b)$

So now we have: \begin{align} \xi_i =max\left { 0,1-y_i(w^Tx_i + b) \right } \end{align}

Base on lagrange duality and KKT conditions, now we get the new target:

\begin{align} min: \sum_{i=1}^{N}\alpha_i - \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_i\alpha_jy_iy_jX_i^TX_j\ s.t. \left{\begin{matrix} 0< \alpha_i<C\ \sum_{i=1}^{N}\alpha_iy_i=0 \end{matrix}\right. \ \end{align}

Optimal Solutions:

$W^* = \sum_{i=1}^{N}x_iy_i\alpha_i$

$b^* = y_i-w^Tx_i$

SMO: Sequential Minimal Optimization

Basic idea: SGD and paired $\alpha$

Each time, we only update tow $\alpha$s, the rest of them were treated as constant Const.

Now set:

  1. $K_{ij} = X_i^TX_j$
  2. $f(x_k)=\sum_{i=1}^{N}\alpha_iy_iX_i^Tx_k+b$
  3. Error: $E_i = f(x_i)-y_i$
  4. $\xi=K_{mm}+K_{nn}-2K_{mn}$

Then we get:

  1. $\alpha_2^{new} = \alpha_2^{old} + y_2\frac{(E_1-E_2)}{\xi}$
  2. $\alpha_1^{new}=\alpha_1^{old}+y_1y_2(\alpha_2^{old}-\alpha_2^{new})$

About

SVM code from scratch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /