1
\$\begingroup\$

How could I improve the following code that runs a simple linear regression using matrix algebra? I import a .csv file (link here) called 'cdd.ny.csv', and perform the matrix calculations that solve for the coefficients (intercept and regressor) of Y = XB (i.e., $(X'X)^{-1}X'Y$):

import numpy
from numpy import *
import csv
df1 = csv.reader(open('cdd.ny.csv', 'rb'),delimiter=',')
tmp = list(df1)
b = numpy.array(tmp).astype('string')
b1 = b[1:,3:5]
b2 = numpy.array(b1).astype('float')
nrow = b1.shape[0]
intercept = ones( (nrow,1), dtype=int16 )
b3 = empty( (nrow,1), dtype = float )
i = 0
while i < nrow:
 b3[i,0] = b2[i,0]
 i = i + 1
X = numpy.concatenate((intercept, b3), axis=1)
X = matrix(X)
Y = b2[:,1]
Y = matrix(Y).T
m1 = dot(X.T,X).I
m2 = dot(X.T,Y)
beta = m1*m2
print beta
#[[-7.62101913]
# [ 0.5937734 ]]

To check my answer:

numpy.linalg.lstsq(X,Y)
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Feb 26, 2012 at 0:49
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$
import numpy
from numpy import *
import csv
df1 = csv.reader(open('cdd.ny.csv', 'rb'),delimiter=',')
tmp = list(df1)
b = numpy.array(tmp).astype('string')
b1 = b[1:,3:5]
b2 = numpy.array(b1).astype('float')

Firstly, I'd avoid all these abbreviated variables. It makes it hard to follow your code. You can also combine the lines a lot more

b2 = numpy.array(list(df1))[1:,3:5].astype('float')

That way we avoid creating so many variables.

nrow = b1.shape[0]
intercept = ones( (nrow,1), dtype=int16 )
b3 = empty( (nrow,1), dtype = float )
i = 0
while i < nrow:
 b3[i,0] = b2[i,0]
 i = i + 1

This whole can be replaced by b3 = b2[:,0]

X = numpy.concatenate((intercept, b3), axis=1)
X = matrix(X)

If you really want to use matrix, combine these two lines. But really, its probably better to use just array not matrix.

Y = b2[:,1]
Y = matrix(Y).T
m1 = dot(X.T,X).I
m2 = dot(X.T,Y)
beta = m1*m2
print beta
answered Feb 26, 2012 at 3:40
\$\endgroup\$
4
  • \$\begingroup\$ Thanks! However, the line X = numpy.concatenate((intercept, b3), axis=1) now gives the error "ValueError: arrays must have same number of dimensions" -- this is the reason I added the while loop. Any way around this? \$\endgroup\$ Commented Feb 26, 2012 at 17:50
  • \$\begingroup\$ @baha-kev, use b3 = b2[:,0].reshape(-1, 1) \$\endgroup\$ Commented Feb 26, 2012 at 18:39
  • \$\begingroup\$ Thanks; you mention it's probably better to use arrays - how do you invert an array? The .I command only works on matrix objects. \$\endgroup\$ Commented Feb 26, 2012 at 19:16
  • \$\begingroup\$ @baha-kev, use the numpy.lingalg.inv function. \$\endgroup\$ Commented Feb 26, 2012 at 19:27

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.