Happy new year everyone, Following Andrew NG's course on coursera here's my implementation for the gradient descent algorithm(univariate linear regression) in Python using pandas, matplotlib with optional visualization which creates a GIF. Any optimizations/suggestions are welcome.
For those who don't know what gradient descent algorithm is: Gradient descent is a first-order iterative optimization algorithm for finding the local minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point. If, instead, one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function. You can check the definition on wikipedia.
Ex1 GIF:
Ex2 GIF:
Links for the datasets used:
ex1:
https://drive.google.com/open?id=1tztcXVillZTrbPeeCd28djRooM5nkiBZ
ex2:
https://drive.google.com/open?id=17ZQ4TLA7ThtU-3J-G108a1fzCH72nSFp
CODE
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import random
import imageio
import os
def compute_cost(b, m, data):
"""
Compute cost function for univariate linear regression using mean squared error(MSE).
Args:
b: Intercept (y = mx + b).
m: Slope (y = mx + b).
data: A pandas df with x and y data.
Return:
Cost function value.
"""
data['Mean Squared Error(MSE)'] = (data['Y'] - (m * data['X'] + b)) ** 2
return data['Mean Squared Error(MSE)'].sum().mean()
def adjust_gradient(b_current, m_current, data, learning_rate):
"""
Adjust Theta parameters for the univariate linear equation y^ = hθ(x) = θ0 + θ1x
Args:
b_current: Current Intercept (y = mx + b) or [Theta(0) θ0].
m_current: Current Slope (y = mx + b) or [Theta(1) θ1].
data: A pandas df with x and y data.
learning_rate: Alpha value.
Return:
Adjusted Theta parameters.
"""
data['b Gradient'] = -(2 / len(data)) * (data['Y'] - ((m_current * data['X']) + b_current))
data['m Gradient'] = -(2 / len(data)) * data['X'] * (data['Y'] - ((m_current * data['X']) + b_current))
new_b = b_current - (data['b Gradient'].sum() * learning_rate)
new_m = m_current - (data['m Gradient'].sum() * learning_rate)
return new_b, new_m
def gradient_descent(data, b, m, learning_rate, max_iter, visual=False):
"""
Optimize Theta values for the univariate linear regression equation y^ = hθ(x) = θ0 + θ1x.
Args:
data: A pandas df with x and y data.
b: Starting b (θ0) value.
m: Starting m (θ1) value.
learning_rate: Alpha value.
max_iter: Maximum number of iterations.
visual: If True, a GIF progression will be generated.
Return:
Optimized values for θ0 and θ1.
"""
line = np.arange(len(data))
folder_name = None
if visual:
folder_name = str(random.randint(10 ** 6, 10 ** 8))
os.mkdir(folder_name)
os.chdir(folder_name)
for i in range(max_iter):
b, m = adjust_gradient(b, m, data, learning_rate)
if visual:
data['Line'] = (line * m) + b
data.plot(kind='scatter', x='X', y='Y', figsize=(8, 8), marker='x', color='r')
plt.plot(data['Line'], color='b')
plt.grid()
plt.title(f'y = {m}x + {b}\nCurrent cost: {compute_cost(b, m, data)}\nIteration: {i}\n'
f'Alpha = {learning_rate}')
fig_name = ''.join([str(i), '.png'])
plt.savefig(fig_name)
plt.close()
if visual:
frames = os.listdir('.')
frames.sort(key=lambda x: int(x.split('.')[0]))
frames = [imageio.imread(frame) for frame in frames]
imageio.mimsave(folder_name + '.gif', frames)
return b, m
if __name__ == '__main__':
data = pd.read_csv('data.csv')
data.columns = ['X', 'Y']
learning = 0.00001
initial_b, initial_m = 0, 0
max_it = 350
b, m = gradient_descent(data, initial_b, initial_m, learning, max_it, visual=True)
1 Answer 1
Nothing happens!
When I ran the script my first thought was that it had hung because nothing happened. Apparently the calculations are just slow on my computer.
To reassure the user that the program is working, have some output indicating progress. Here's how I added it, but you can make it way fancier with progress bars and stuff:
n_reports = 20
for i in range(max_iter):
if max_iter >= n_reports and i % (max_iter // n_reports) == 1:
print(f'{(i * 100 // max_iter)}% ', end = '', flush = True)
...
print('100%')
n_reports
is the number of times the completion percentage is
printed. An impatient user wants it printed more often, a patient one
(or one with a faster computer) wants it printed less often.
Temporary directories
Python has a module for temporary files and directories which you
should use instead of creating your own. Also, never change the
process current working directory (the chdir
call) unless you have
to. It can cause very strange problems.
The tempfile
and pathlib
modules simplifies file handling:
if visual:
save_dir = pathlib.Path(tempfile.mkdtemp())
...
for i in range(max_iter):
...
if visual:
...
fig_name = f'{i:05}.png'
plt.savefig(save_dir / fig_name)
...
if visual:
frames = sorted([f.resolve() for f in save_dir.iterdir()])
frames = [imageio.imread(frame) for frame in frames]
image_file = save_dir.name + '.gif'
imageio.mimsave(image_file, frames)
print(f'Saved image as {image_file}')
I also made it so that filenames are padded with zeroes. That way, you can rely on the lexicographical order and you don't have to write your own key function.
-
\$\begingroup\$ What does the
flush = True
in the print statement do? and does the temporary directory (with the .png frames) disappear after the execution and if so, where the .gif file will be saved? and what do you think of the implementation of the algorithm? any suggestions you have regarding how to optimize close tosklearn
auto-optimization done by theLinearRegression()
? \$\endgroup\$watch-this– watch-this2020年01月02日 21:37:21 +00:00Commented Jan 2, 2020 at 21:37 -
\$\begingroup\$ It flushes so that the output becomes visible despite lacking an ending newline character. The temporary directory has to be removed with
shutil.rmtree(save_dir)
. The gif is saved in the current directory. Your gradient descent implementation is fine, but lacks learning rate scheduling and early stoppage (detection if an optima is reached). Of course, it is not the right tool for the job and there are better methods for univariate linear regression. \$\endgroup\$Gaslight Deceive Subvert– Gaslight Deceive Subvert2020年01月02日 22:59:15 +00:00Commented Jan 2, 2020 at 22:59 -
\$\begingroup\$ Thanks for the feedback and I know using
LinearRegression()
is sufficient for the job, I've done it for the sake of understanding how things work as i'm following Andrew NG's course(which is done in Octave and Matlab) so I have to convert everything I learn to Python. \$\endgroup\$watch-this– watch-this2020年01月02日 23:15:48 +00:00Commented Jan 2, 2020 at 23:15
Explore related questions
See similar questions with these tags.