Checking convergence of 2-layer neural network in python

Question 1

I am working with the following code:

import numpy as np
def sigmoid(x):
 return 1.0/(1.0 + np.exp(-x))
def sigmoid_prime(x):
 return sigmoid(x)*(1.0-sigmoid(x))
def tanh(x):
 return np.tanh(x)
def tanh_prime(x):
 return 1.0 - x**2
class NeuralNetwork:
 def __init__(self, layers, activation='tanh'):
 if activation == 'sigmoid':
 self.activation = sigmoid
 self.activation_prime = sigmoid_prime
 elif activation == 'tanh':
 self.activation = tanh
 self.activation_prime = tanh_prime
 # Set weights
 self.weights = []
 # layers = [2,2,1]
 # range of weight values (-1,1)
 # input and hidden layers - random((2+1, 2+1)) : 3 x 3
 for i in range(1, len(layers) - 1):
 r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) -1
 self.weights.append(r)
 # output layer - random((2+1, 1)) : 3 x 1
 r = 2*np.random.random( (layers[i] + 1, layers[i+1])) - 1
 self.weights.append(r)
 def fit(self, X, y, learning_rate=0.2, epochs=100000):
 # Add column of ones to X
 # This is to add the bias unit to the input layer
 ones = np.atleast_2d(np.ones(X.shape[0]))
 X = np.concatenate((ones.T, X), axis=1)
 for k in range(epochs):
 if k % 10000 == 0: print 'epochs:', k
 i = np.random.randint(X.shape[0])
 a = [X[i]]
 for l in range(len(self.weights)):
 dot_value = np.dot(a[l], self.weights[l])
 activation = self.activation(dot_value)
 a.append(activation)
 # output layer
 error = y[i] - a[-1]
 deltas = [error * self.activation_prime(a[-1])]
 # we need to begin at the second to last layer 
 # (a layer before the output layer)
 for l in range(len(a) - 2, 0, -1): 
 deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_prime(a[l]))
 # reverse
 # [level3(output)->level2(hidden)] => [level2(hidden)->level3(output)]
 deltas.reverse()
 # backpropagation
 # 1. Multiply its output delta and input activation 
 # to get the gradient of the weight.
 # 2. Subtract a ratio (percentage) of the gradient from the weight.
 for i in range(len(self.weights)):
 layer = np.atleast_2d(a[i])
 delta = np.atleast_2d(deltas[i])
 self.weights[i] += learning_rate * layer.T.dot(delta)
 def predict(self, x): 
 a = np.concatenate((np.ones(1).T, np.array(x)), axis=0) 
 for l in range(0, len(self.weights)):
 a = self.activation(np.dot(a, self.weights[l]))
 return a
if __name__ == '__main__':
 nn = NeuralNetwork([2,2,1])
 X = np.array([[0, 0],
 [0, 1],
 [1, 0],
 [1, 1]])
 y = np.array([0, 1, 1, 0])
 nn.fit(X, y)
 for e in X:
 print(e,nn.predict(e))

While this converges well and fast when using the tanh, it does converge much slower when using the sigmoid ( in def __init__(self, layers, activation='tanh') change tanh to sigmoid ). I cannot find why that is. How do I improve the implementation for the sigmoid?

Question 2

Are you interested in reviews on aspects of the code unrelated to the sigmoid implementation?

Question 3

Of course I am!

Question 4

The reasons for the speed discrepancy

The reason for the differences in timing are because evaluating sigmoid_prime() takes far longer than tanh_prime(). You can see this if you use a line profiler such as the line_profiler module.
Is tanh_prime() supposed to be the derivative of tanh()? If so, you might want to double-check your formula. The derivative of tanh(x) is 1. - tanh(x)**2, not 1. - x**2.
In fact, if you use the the actual definition of the derivative of tanh(), the timings become much more similar.
```
def tanh_prime_alt(x):
 return 1 - tanh(x)**2
foo = np.random.rand(10000)
%timeit -n 100 tanh_prime(foo)
%timeit -n 100 tanh_prime_alt(foo)
%timeit -n 100 sigmoid_prime(foo)
100 loops, best of 3: 10.2 μs per loop
100 loops, best of 3: 116 μs per loop
100 loops, best of 3: 279 μs per loop
```
So with this alternate tanh_prime(), the sigmoid method is now only 2× slower, not 20× slower. I should emphasize that (a) I don't know enough about neural networks to know if 1. - x**2 is an appropriate expression or approximation to the actual derivative of tanh(), but if it is in fact OK, then (b) the reason that activation = 'tanh' is so much faster is because of this approximation/error.

The remaining 2× difference is because in your factored expression of sigmoid_prime(), you are needlessly evaluating sigmoid() twice. I'd instead do this:

def sigmoid_prime_alt(x):
 sig_x = sigmoid(x)
 return sig_x - sig_x**2

As expected, this speeds things up two-fold relative to your original definition.

foo = np.random.rand(10000)
%timeit -n 100 sigmoid_prime(foo)
%timeit -n 100 sigmoid_prime_alt(foo)
100 loops, best of 3: 248 μs per loop
100 loops, best of 3: 132 μs per loop

Since the sigmoid() function and the tanh() function are related by tanh(x) = 2 * sigmoid(2*x) - 1, i.e. sigmoid(x) = (1 + tanh(x/2.))/2, then iff you are OK with the weird 1 - x**2 approximation for tanh_prime(), you should be able to work out a similar approximation for sigmoid_prime().
You might be interested in the autograd module, which provides a generalized capability to compute symbolic derivatives of most NumPy code.

Other comments

These comments aren't a thorough review, but just some things I noticed.

Why are your weights Python lists instead of NumPy arrays? If you're already using NumPy, you might as well use it wherever you can.
You probably don't need the for l in range(len(self.weights)): loop, do you? Can't you use NumPy array slicing and the matrix capabilities of np.dot() to replace this loop?
If you are going to loop, you don't need to do for l in range(len(self.weights)) and then reference self.weights[l]. You can do for weight in self.weights: and then reference weight in your loop code, for example.
Write some docstrings for your functions please!

Curt F. Curt F.Curt F. 1,65611 silver badges22 bronze badges · Accepted Answer · 2016-06-03 03:54:00Z

The reasons for the speed discrepancy

The reason for the differences in timing are because evaluating sigmoid_prime() takes far longer than tanh_prime(). You can see this if you use a line profiler such as the line_profiler module.
Is tanh_prime() supposed to be the derivative of tanh()? If so, you might want to double-check your formula. The derivative of tanh(x) is 1. - tanh(x)**2, not 1. - x**2.
In fact, if you use the the actual definition of the derivative of tanh(), the timings become much more similar.
```
def tanh_prime_alt(x):
 return 1 - tanh(x)**2
foo = np.random.rand(10000)
%timeit -n 100 tanh_prime(foo)
%timeit -n 100 tanh_prime_alt(foo)
%timeit -n 100 sigmoid_prime(foo)
100 loops, best of 3: 10.2 μs per loop
100 loops, best of 3: 116 μs per loop
100 loops, best of 3: 279 μs per loop
```
So with this alternate tanh_prime(), the sigmoid method is now only 2× slower, not 20× slower. I should emphasize that (a) I don't know enough about neural networks to know if 1. - x**2 is an appropriate expression or approximation to the actual derivative of tanh(), but if it is in fact OK, then (b) the reason that activation = 'tanh' is so much faster is because of this approximation/error.

The remaining 2× difference is because in your factored expression of sigmoid_prime(), you are needlessly evaluating sigmoid() twice. I'd instead do this:

def sigmoid_prime_alt(x):
 sig_x = sigmoid(x)
 return sig_x - sig_x**2

As expected, this speeds things up two-fold relative to your original definition.

foo = np.random.rand(10000)
%timeit -n 100 sigmoid_prime(foo)
%timeit -n 100 sigmoid_prime_alt(foo)
100 loops, best of 3: 248 μs per loop
100 loops, best of 3: 132 μs per loop

Since the sigmoid() function and the tanh() function are related by tanh(x) = 2 * sigmoid(2*x) - 1, i.e. sigmoid(x) = (1 + tanh(x/2.))/2, then iff you are OK with the weird 1 - x**2 approximation for tanh_prime(), you should be able to work out a similar approximation for sigmoid_prime().
You might be interested in the autograd module, which provides a generalized capability to compute symbolic derivatives of most NumPy code.

Other comments

These comments aren't a thorough review, but just some things I noticed.

Why are your weights Python lists instead of NumPy arrays? If you're already using NumPy, you might as well use it wherever you can.
You probably don't need the for l in range(len(self.weights)): loop, do you? Can't you use NumPy array slicing and the matrix capabilities of np.dot() to replace this loop?
If you are going to loop, you don't need to do for l in range(len(self.weights)) and then reference self.weights[l]. You can do for weight in self.weights: and then reference weight in your loop code, for example.
Write some docstrings for your functions please!

Stack Exchange Network

Checking convergence of 2-layer neural network in python

1 Answer 1

The reasons for the speed discrepancy

Other comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Checking convergence of 2-layer neural network in python

1 Answer 1

The reasons for the speed discrepancy

Other comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions