I want to create a new array with specific elements of the original array. I created a minimal example and there it works what I want to achieve, but on the actual data it doesn't and I can't figure out the essential difference.
First the minimal example: I want to get the numbers of every row that stand under the letters A-C and save it in the array B_feat.
import numpy as np
years = 5 #A-E
yearsf = 3 #A-C
B_new =(['0','A','B','C','D','E','A','B','C','D','E'],
['X','2','3','3','3','4','6','5','4','3','4'],
['Y','3','4','6','7','3','2','4','7','9','8'],
['Z','3','4','6','3','4','6','9','1','4','7'])
B_feat = np.zeros((3,2*yearsf))
i=0
for row in B_feat:
j=0
k=0
for element in row:
B_feat[i][j:int(j+yearsf)]=B_new[i+1][k+1:int(k+yearsf+1)]
j+=yearsf
k+=years
i+=1
print B_feat
and I receive
[[ 2. 3. 3. 6. 5. 4.]
[ 3. 4. 6. 2. 4. 7.]
[ 3. 4. 6. 6. 9. 1.]]
Now with the actual data I have:
years = 9
yearsf = 4
np.shape(B_new) = (244, 181)
np.shape(B_feat) = (243, 76)
I want to have a new array B_feat that ignores the first row and column of B_new, then skips 9 columns and starts "extracting" always the first 4 out of the next 9 row elements.
import numpy as np
i=0
for row in B_feat:
j=0
k=0
for element in row:
B_feat[i][j:int(j+yearsf)]=B_new[i+1][int(k+1+years):int(k+years+yearsf+1)]
j+=yearsf
k+=years
i+=1
When running the code, I receive the following error:
IndexError: index 80 is out of bounds for axis 0 with size 76
I don't really understand this error since I thought axis 0 is down the rows (where I have 243 for B_feat) and I could't figure out where the index goes up to 80.
As I'm new to python and this forum, please let me know if I can improve my question or anything is unclearly stated.
1 Answer 1
Short answer, you are getting out of range as the error suggest, and this is also true for the minimal example that seems to work. I do not know why it works for the minimal example in the first place.
Detailled answer:
In the minimal example, B_newhas 6 columns indexed from 0 to 5.
Your internal loop iterate over the number of columns of B_new.
And for each iteration, it increments j by yearsf which is 3.
At the third iteration, j is 6 wich is larger than the max index 5.
You have the exact same issue with the index k.
Suggestion:
The number of iteration in your internal loop must be the number of group
of columns to process, 2 in your minimal example. By group of column, I
simply mean a set of columns below 'A-C'.
Your loops can be turned into something like this:
i=0
for row in B_feat:
for j,k in zip( range(0,B_feat.shape[1], yearsf),
range(1,B_new.shape[1], years)):
print 'i = ', i, ', j = ', j, ', k = ', k
B_feat[i][j:int(j+yearsf)]=B_new[i+1][k:int(k+yearsf)]
i+=1
Notice that I remove the +1 in the slice of B_new and set k
to start at 1
And I will advise you to turn the external loop into something like
for i in range(B_feat.shape[0]) and remove the statement i+=1