Python arrays dimension issues

Question 1

I am struggling once again with Python, NumPy and arrays to compute some calculations between matrices.

The code part that is likely not working properly is as follows:

train, test, cv = np.array_split(data, 3, axis = 0) 
train_inputs = train[:,: -1]
test_inputs = test[:,: -1]
cv_inputs = cv[:,: -1]
train_outputs = train[:, -1]
test_outputs = test[:, -1]
cv_outputs = cv[:, -1]

When printing those matrices informations (np.ndim, np.shape and dtype respectively), this is what you get:

2
1
2
1
2
1
(94936, 30)
(94936,)
(94936, 30)
(94936,)
(94935, 30)
(94935,)
float64
float64
float64
float64
float64
float64

I believe it is missing 1 dimension in all *_output arrays.

The other matrix I need is created by this command:

newMatrix = neuronLayer(30, 94936)

In which neuronLayer is a class defined as:

class neuronLayer():
 def __init__(self, neurons, neuron_inputs):
 self.weights = 2 * np.random.random((neuron_inputs, neurons)) - 1

Here's the final output:

outputLayer1 = self.__sigmoid(np.dot(inputs, self.layer1.weights))
ValueError: shapes (94936,30) and (94936,30) not aligned: 30 (dim 1) != 94936 (dim 0)

Python is clearly telling me the matrices are not adding up but I am not understanding where is the problem.

Any tips?

PS: The full code is pasted ħere.

Question 2

When using dot(x, y) (on 2d arrays), numpy requires that the shapes of x and y respectively are (A, B) and (B,C), whereas yours are (A, B) and (A, B)

Question 3

As mentioned below, trying that will bring a Memory Error. Not sure if that is related with my old computer or so but initially thought it just wasn't a proper fix.

Question 4

What is the dot supposed to produce? A 30x30 or a 94936x94936 (too big?) array?

Question 5

The np.dot() function is used to multiply matrices, isn't it? The idea is to multiply the input matrix and the weights, which are 94936x30 and 94936x1, respectively. That should produce a 949346x1, right? The code might be wrong for that purpose

Question 6

Review how you generate the weights

Question 7

layer1 = neuronLayer(30, 94936) # 29 neurons with 227908 inputs
layer2 = neuronLayer(1, 30) # 1 Neuron with the previous 29 inputs

where `nueronLayer creates

self.weights = 2 * np.random.random((neuron_inputs, neurons)) - 1

the 2 weights are (94936,30) and (30,1) in size.

This line does not make any sense. I surprised it doesn't give an error

layer1error = layer2delta.dot(self.layer2.weights.np.transpose)

I suspect you want np.transpose(self.layer2.weights) or self.layer2.weights.T.

But maybe it doesn't get there. train first calls think with a (94936,30) inputs

 outputLayer1 = self.__sigmoid(np.dot(inputs, self.layer1.weights))
 outputLayer2 = self.__sigmoid(np.dot(outputLayer1, self.layer2.weights))

So it tries to do a np.dot with 2 (94936,30), (94936,30) arrays. They aren't compatible for a dot. You could transpose one or the other, producing either (94936,94936) array or (30,30). One looks too big. The (30,30) is compatible with the weights for the 2nd layer.

np.dot(inputs.T, self.layer1.weights)

has a chance of working right.

np.dot(outputLayer1, self.layer2.weights)
(30,30) with (30,1) => (30,1)

But then you do

train_outputs - outputLayer2

That will have problems regardless of whether train_outputs is (94936,) or (94936,1)

You need to make sure that arrays shapes flow correctly through the calculation. Don't just check them at the start. Check then internally. And make you sure you understand what shapes they should have at each step.

It would be a whole lot easier to develop and test this code with much smaller inputs and layers, something like 10 samples and 3 features. That way you can look at the values as well as the shapes.

Question 8

Thank you for the very detailed answer. I learned a lot with this. I knew for a start the matrices sizes and dimensions weren't correct, just couldn't figure where. Although, you mentioned a good alternative to start working with this and that is to start with something smaller and then scale it higher. This might work to understand well the code and what it is doing. As for the line it doesn't make sense, it doesn't either to me :) I need to multiply a transposed matrix. Just couldn't figure out the right order to do so. Thanks again!

Question 9

np.dot uses matrix multiplication when its arguments are matrices. It looks like your code is trying to multiply two non-square matrices together with the same dimensions which doesn't work. Perhaps you meant to transpose one of the matrices? Numpy matrices have a T property that returns the transpose, you could try:

self.__sigmoid(np.dot(inputs.T, self.layer1.weights))

Question 10

That outputs Memorry Error. This computer is old though, so that could be an issue. I have tried that but since I got this error, I though this wasn't a fix.

Question 11

@GustavoSilva: What shape do you want np.dot(...) to be here? I think this is making it (94936, 94936), which is obviously pretty big

hpaulj 233k14 gold badges260 silver badges392 bronze badges · Accepted Answer · 2017-01-29 20:43:07Z

layer1 = neuronLayer(30, 94936) # 29 neurons with 227908 inputs
layer2 = neuronLayer(1, 30) # 1 Neuron with the previous 29 inputs

where `nueronLayer creates

self.weights = 2 * np.random.random((neuron_inputs, neurons)) - 1

the 2 weights are (94936,30) and (30,1) in size.

This line does not make any sense. I surprised it doesn't give an error

layer1error = layer2delta.dot(self.layer2.weights.np.transpose)

I suspect you want np.transpose(self.layer2.weights) or self.layer2.weights.T.

But maybe it doesn't get there. train first calls think with a (94936,30) inputs

 outputLayer1 = self.__sigmoid(np.dot(inputs, self.layer1.weights))
 outputLayer2 = self.__sigmoid(np.dot(outputLayer1, self.layer2.weights))

So it tries to do a np.dot with 2 (94936,30), (94936,30) arrays. They aren't compatible for a dot. You could transpose one or the other, producing either (94936,94936) array or (30,30). One looks too big. The (30,30) is compatible with the weights for the 2nd layer.

np.dot(inputs.T, self.layer1.weights)

has a chance of working right.

np.dot(outputLayer1, self.layer2.weights)
(30,30) with (30,1) => (30,1)

But then you do

train_outputs - outputLayer2

That will have problems regardless of whether train_outputs is (94936,) or (94936,1)

You need to make sure that arrays shapes flow correctly through the calculation. Don't just check them at the start. Check then internally. And make you sure you understand what shapes they should have at each step.

It would be a whole lot easier to develop and test this code with much smaller inputs and layers, something like 10 samples and 3 features. That way you can look at the values as well as the shapes.

Thank you for the very detailed answer. I learned a lot with this. I knew for a start the matrices sizes and dimensions weren't correct, just couldn't figure where. Although, you mentioned a good alternative to start working with this and that is to start with something smaller and then scale it higher. This might work to understand well the code and what it is doing. As for the line it doesn't make sense, it doesn't either to me :) I need to multiply a transposed matrix. Just couldn't figure out the right order to do so. Thanks again!

CollectivesTM on Stack Overflow

Python arrays dimension issues

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related