I currently follow along Andrew Ng's Machine Learning Course on Coursera and wanted to implement the gradient descent algorithm in python3 using numpy and pandas.
This is what I came up with:
import os
import numpy as np
import pandas as pd
def get_training_data(path): # path to read data from
raw_panda_data = pd.read_csv(path)
# append a column of ones to the front of the data set
raw_panda_data.insert(0, 'Ones', 1)
num_columns = raw_panda_data.shape[1] # (num_rows, num_columns)
panda_X = raw_panda_data.iloc[:,0:num_columns-1] # [ slice_of_rows, slice_of_columns ]
panda_y = raw_panda_data.iloc[:,num_columns-1:num_columns] # [ slice_of_rows, slice_of_columns ]
X = np.matrix(panda_X.values) # pandas.DataFrame -> numpy.ndarray -> numpy.matrix
y = np.matrix(panda_y.values) # pandas.DataFrame -> numpy.ndarray -> numpy.matrix
return X, y
def compute_mean_square_error(X, y, theta):
summands = np.power(X * theta.T - y, 2)
return np.sum(summands) / (2 * len(X))
def gradient_descent(X, y, learning_rate, num_iterations):
num_parameters = X.shape[1] # dim theta
theta = np.matrix([0.0 for i in range(num_parameters)]) # init theta
cost = [0.0 for i in range(num_iterations)]
for it in range(num_iterations):
error = np.repeat((X * theta.T) - y, num_parameters, axis=1)
error_derivative = np.sum(np.multiply(error, X), axis=0)
theta = theta - (learning_rate / len(y)) * error_derivative
cost[it] = compute_mean_square_error(X, y, theta)
return theta, cost
This is how one could use the code:
X, y = get_training_data(os.getcwd() + '/data/data_set.csv')
theta, cost = gradient_descent(X, y, 0.008, 10000)
print('Theta: ', theta)
print('Cost: ', cost[-1])
Where data/data_set.csv could contain data (model used: 2 + x1 - x2 = y) looking like this:
x1, x2, y
0, 1, 1
1, 1, 2
1, 0, 3
0, 0, 2
2, 4, 0
4, 2, 4
6, 0, 8
Output:
Theta: [[ 2. 1. -1.]]
Cost: 9.13586056551e-26
I'd especially like to get the following aspects of my code reviewed:
- Overall
pythonstyle. I'm relatively new topythoncoming from aCbackground and not sure if I'm misunderstanding some concepts here. numpy/pandasintegration. Do I use these packages correctly?- Correctness of the gradient descent algorithm.
- Efficiency. How can I further improve my code?
3 Answers 3
Without having the insight (or, honestly, time) to verify your actual algorithm, I can say that your Python is pretty good.
Only minor stuff - this kind of comment - # path to read data from - should be turned into a PEP257-style docstring.
You should add a shebang at the top of your file, probably #!/usr/bin/env python3.
Otherwise, you're off to a good start.
-
1\$\begingroup\$ I just added the shebang and made the script executable. It's honestly so much more comfortable than typing
python3 gradient_descent.pyall the time. Thank you for the tipps! \$\endgroup\$Hericks– Hericks2017年07月25日 18:28:36 +00:00Commented Jul 25, 2017 at 18:28
I like your Python style. There is an issue with your algorithm though. numpy.repeat does not work the way you expect it to. Try this code:
import numpy as np
theta = np.matrix([1,2,3])
y = 2
X = np.matrix(np.array(range(9)).reshape(3,3))
error = np.repeat((X * theta.T) - y, 3, axis=1)
print(error)
>>>[[ 6 6 6]
[24 24 24]
[42 42 42]]
print(np.dot(X, theta.T)-y)
>>>[[ 6]
[24]
[42]
Do you see how numpy.repeat returns a matrix although you want to return a vector?
-
\$\begingroup\$ Do you refer to my line
error = np.repeat((X * theta.T) - y, num_parameters, axis=1)?. I thought about it and still think, that my calculation is correct. I calculate the heuristic function for the x-values viaX * theta.T. The result is a matrix with a single column, where the i-th row contains the difference of the heuristic function for the x-values of the i-th training example and output value in the i-th training example. What do you think is wrong here? \$\endgroup\$Hericks– Hericks2018年08月15日 22:41:19 +00:00Commented Aug 15, 2018 at 22:41 -
\$\begingroup\$ Sry, my previous answer was wrong. I edited it accordingly. \$\endgroup\$thoq– thoq2018年08月21日 17:24:02 +00:00Commented Aug 21, 2018 at 17:24
You can also use numpy.zeros(shape) to initialize the tetha and cost vectors with zeros.
You must log in to answer this question.
Explore related questions
See similar questions with these tags.
np.zerosto initializethetaandcostin your gradient descent function, in my opinion it is clearer. Also why uppercase X and lowercase y? I would make them consistent and perhaps even give them descriptive names, e.g.inputandoutput. Finally, you could look into exceptions handling e.g. for bad input data from pandas or invalid values forlearning_rateornum_iterations. \$\endgroup\$theta = np.zeros_like(X)if you would like to initializethetawith an array of zeros with dimensions ofX. \$\endgroup\$thetadoesn't have the same dimensions asX. Regardless I'll keep thenp.zeros_like(...)function in the back of my head. \$\endgroup\$