|
16 | 16 | "\n", |
17 | 17 | "The equation of a simple linear regression can be expressed as:\n", |
18 | 18 | "\n", |
19 | | - "$$ \\hat{y} = mx + b \\tag{1} $$\n", |
| 19 | + "$$ \\hat{y} = mx + b -- (1)$$ \n", |
20 | 20 | "\n", |
21 | 21 | "Thus, we have two parameters $m$ and $b$. We will see how can we use gradient descent and find the optimal values for these two parameters $m$ and $b$. \n" |
22 | 22 | ] |
|
33 | 33 | { |
34 | 34 | "cell_type": "code", |
35 | 35 | "execution_count": 1, |
36 | | - "metadata": {}, |
| 36 | + "metadata": { |
| 37 | + "collapsed": true |
| 38 | + }, |
37 | 39 | "outputs": [], |
38 | 40 | "source": [ |
39 | 41 | "import warnings\n", |
|
59 | 61 | { |
60 | 62 | "cell_type": "code", |
61 | 63 | "execution_count": 2, |
62 | | - "metadata": {}, |
| 64 | + "metadata": { |
| 65 | + "collapsed": true |
| 66 | + }, |
63 | 67 | "outputs": [], |
64 | 68 | "source": [ |
65 | 69 | "data = np.random.randn(500, 2)" |
|
162 | 166 | { |
163 | 167 | "cell_type": "code", |
164 | 168 | "execution_count": 6, |
165 | | - "metadata": {}, |
| 169 | + "metadata": { |
| 170 | + "collapsed": true |
| 171 | + }, |
166 | 172 | "outputs": [], |
167 | 173 | "source": [ |
168 | 174 | "theta = np.zeros(2)" |
|
203 | 209 | "\n", |
204 | 210 | "Mean Squared Error (MSE) of Regression is given as:\n", |
205 | 211 | "\n", |
206 | | - "$$J=\\frac{1}{N} \\sum_{i=1}^{N}(y-\\hat{y})^{2} \\tag{2}$$\n", |
| 212 | + "$$J=\\frac{1}{N} \\sum_{i=1}^{N}(y-\\hat{y})^{2} -- (2) $$\n", |
207 | 213 | "\n", |
208 | 214 | "\n", |
209 | 215 | "Where $N$ is the number of training samples, $y$ is the actual value and $\\hat{y}$ is the predicted value.\n", |
|
216 | 222 | { |
217 | 223 | "cell_type": "code", |
218 | 224 | "execution_count": 8, |
219 | | - "metadata": {}, |
| 225 | + "metadata": { |
| 226 | + "collapsed": true |
| 227 | + }, |
220 | 228 | "outputs": [], |
221 | 229 | "source": [ |
222 | 230 | "def loss_function(data,theta):\n", |
|
287 | 295 | "Gradients of loss function $J$ with respect to parameter $m$ is given as:\n", |
288 | 296 | "\n", |
289 | 297 | "\n", |
290 | | - "$ \\frac{d J}{d m}=\\frac{2}{N} \\sum_{i=1}^{N}-x_{i}\\left(y_{i}-\\left(m x_{i}+b\\right)\\right) \\tag{3}$\n", |
| 298 | + "$$ \\frac{d J}{d m}=\\frac{2}{N} \\sum_{i=1}^{N}-x_{i}\\left(y_{i}-\\left(m x_{i}+b\\right)\\right) -- (3) $$\n", |
291 | 299 | "\n", |
292 | 300 | "\n", |
293 | 301 | "Gradients of loss function $J$ with respect to parameter $b$ is given as:\n", |
294 | | - "$ \\frac{d J}{d b}=\\frac{2}{N} \\sum_{i=1}^{N}-\\left(y_{i}-\\left(m x_{i}+b\\right)\\right)\\tag{4} $\n", |
| 302 | + "\n", |
| 303 | + "\n", |
| 304 | + "$$ \\frac{d J}{d b}=\\frac{2}{N} \\sum_{i=1}^{N}-\\left(y_{i}-\\left(m x_{i}+b\\right)\\right) -- (4) $$\n", |
295 | 305 | "\n", |
296 | 306 | "We define a function called compute_gradients which takes the data and parameter theta as an input and returns the computed gradients: " |
297 | 307 | ] |
298 | 308 | }, |
299 | 309 | { |
300 | 310 | "cell_type": "code", |
301 | 311 | "execution_count": 10, |
302 | | - "metadata": {}, |
| 312 | + "metadata": { |
| 313 | + "collapsed": true |
| 314 | + }, |
303 | 315 | "outputs": [], |
304 | 316 | "source": [ |
305 | 317 | "def compute_gradients(data, theta):\n", |
|
360 | 372 | "\n", |
361 | 373 | "After computing gradients we need to update our model paramater according to our update rule as given below:\n", |
362 | 374 | "\n", |
363 | | - "$m=m-\\alpha \\frac{d J}{d m} \\tag{5}$\n", |
| 375 | + "$$m=m-\\alpha \\frac{d J}{d m} -- (5) $$ \n", |
364 | 376 | "\n", |
365 | | - "$ b=b-\\alpha \\frac{d J}{d b}\\tag{6}$\n", |
| 377 | + "$$ b=b-\\alpha \\frac{d J}{d b} --(6) $$\n", |
366 | 378 | "\n", |
367 | 379 | "\n", |
368 | 380 | "Since we stored $m$ in theta[0] and $b$ in theta[1], we can write our update equation as: \n", |
369 | 381 | "\n", |
370 | | - "$\\theta = \\theta - \\alpha \\frac{dJ}{d\\theta} \\tag{7}$\n", |
| 382 | + "$$\\theta = \\theta - \\alpha \\frac{dJ}{d\\theta} -- (7) $$\n", |
371 | 383 | "\n", |
372 | 384 | "As we learned in the previous section, updating gradients for just one time will not lead us to the convergence i.e minimum of the cost function, so we need to compute gradients and the update the model parameter for several iterations:\n", |
373 | 385 | "\n", |
|
378 | 390 | { |
379 | 391 | "cell_type": "code", |
380 | 392 | "execution_count": 12, |
381 | | - "metadata": {}, |
| 393 | + "metadata": { |
| 394 | + "collapsed": true |
| 395 | + }, |
382 | 396 | "outputs": [], |
383 | 397 | "source": [ |
384 | 398 | "num_iterations = 50000" |
|
394 | 408 | { |
395 | 409 | "cell_type": "code", |
396 | 410 | "execution_count": 13, |
397 | | - "metadata": {}, |
| 411 | + "metadata": { |
| 412 | + "collapsed": true |
| 413 | + }, |
398 | 414 | "outputs": [], |
399 | 415 | "source": [ |
400 | 416 | "lr = 1e-2" |
|
410 | 426 | { |
411 | 427 | "cell_type": "code", |
412 | 428 | "execution_count": 14, |
413 | | - "metadata": {}, |
| 429 | + "metadata": { |
| 430 | + "collapsed": true |
| 431 | + }, |
414 | 432 | "outputs": [], |
415 | 433 | "source": [ |
416 | 434 | "loss = []" |
|
426 | 444 | { |
427 | 445 | "cell_type": "code", |
428 | 446 | "execution_count": 15, |
429 | | - "metadata": {}, |
| 447 | + "metadata": { |
| 448 | + "collapsed": true |
| 449 | + }, |
430 | 450 | "outputs": [], |
431 | 451 | "source": [ |
432 | 452 | "theta = np.zeros(2)\n", |
|
496 | 516 | ], |
497 | 517 | "metadata": { |
498 | 518 | "kernelspec": { |
499 | | - "display_name": "Python 2", |
| 519 | + "display_name": "Python [default]", |
500 | 520 | "language": "python", |
501 | 521 | "name": "python2" |
502 | 522 | }, |
|
510 | 530 | "name": "python", |
511 | 531 | "nbconvert_exporter": "python", |
512 | 532 | "pygments_lexer": "ipython2", |
513 | | - "version": "2.7.12" |
| 533 | + "version": "2.7.11" |
514 | 534 | } |
515 | 535 | }, |
516 | 536 | "nbformat": 4, |
|
0 commit comments